« December 2002 | Main | November 2003 »

January 25, 2003

qmail与Postfix的性能对比测试

一直以来,人们普遍的观点都认为qmail很快,比sendmail快得多,甚至有人还吹嘘说qmail比postfix也快,为了来一个客观一点的评价,对qmail/Postfix进行测试看来是必要的了,至少可以给邮件系统的选择提供一个评价的基准。

在测试postfix同样配置的另外一个机器上装了由hleil提供的rpm包,并咨询了一下他关于qmail的问题,在此表示感谢。

1.环境
2台配置完全一致的PIII 933*2 + 512M + U160 SCSI 磁盘,qmail-1.03-7。另外系统做了优化,调整了fs.file-max=65535并放开一般限制,将/var目录单独分开。

先使用smtp-sink测试qmail,使用系统帐号,效果是26封/秒,觉得效果不理想。

hleil表示,我做的smtp 测试只测试了smtpd的性能而已,队列并没受到真正考验,因为smtp的速度才是bottleneck。

2.简单结果:
用系统帐号,smtp-sink 配置成发1000封信,10个并发,耗费了56秒。换成100并发,测试失败,加大了qmail的并发数后完成顺利,结果也大概是26封/秒。

然后用postal软件,50个并发连接,每个连接发一封信,qmail配置到400个并发上限。结果就大概是1150-1200封/分钟的样子,怎么提高都不行了。看了一下top,发现kjounald频繁的出现,估计是写操作耗费了很多资源,并根据postfix测试结果推断可能是multilog耗费了很多资源的原因。于是去掉multilog记录的功能,qmail不使用log系统,再测试:

[root@ns2 postal-0.61]# ./do.sh
time,messages,data(K),errors,connections,SSL connections
14:52,2029,2767,0,2078,0
14:53,2056,2684,0,2056,0
14:54,2061,2785,0,2062,0
14:55,2022,2686,0,2021,0
14:56,1998,2684,0,1998,0
14:57,2006,2666,0,2007,0
14:58,1971,2644,0,1971,0
14:59,1971,2584,0,1970,0
15:00,1942,2596,0,1943,0
15:01,1984,2655,0,1984,0
15:02,1957,2567,0,1957,0

很明显速度提升1倍左右,但部分测试出现could not creat qq(队列)的问题,看不出具体的原因。估计是qmail没足够资源去创建队列,或者速度太快队列出了问题?

然后再将/etc/passwd的信息转换成cdb提高查询速度:
qmail-pw2u < /etc/passwd > /var/qmail/users/assign
qmail-newu
然后再测试,结果如下:

[root@ns2 postal-0.61]# ./do.sh
time,messages,data(K),errors,connections,SSL connections
16:42,2118,2858,0,2167,0
16:43,2095,2788,0,2096,0
16:44,2114,2790,0,2114,0
16:45,2245,3002,0,2244,0
16:46,2233,2971,0,2233,0
16:47,2165,2891,0,2166,0
16:48,1977,2637,0,1976,0
16:49,1999,2645,0,2000,0

注入速度似乎提高了5%-10%。

hleil对上述测试结果认为:这个测试是测试smtp的注入,并没办法真正考验队列,如上所发现的那样,队列应该没有受到特别的考验,即便是fail to create queue也可能是其他原因。必须是mx-mx真实环境下才有意义,而且一个smtp过程比现在的复杂。

注意:原文我记得不是太清晰,只是转述hleil说的内容。

当时我认为这样的测试起码还是有点效果的,首先就是测试了邮件dos情况。如果出现dos的时候,不管mta用何方法,只要支持住就可以了。如果qmail真的如我测试这样容易出现队列crash的话,是比较危险的。

但似乎这个问题不是那么严重,有那么多qmail用户,如果真的那么多DOS攻击,那么多qmail出问题,一定有解决办法的。

其次就是log的问题,去掉log后速度提高很多,可以说明必须将log的开销尽量降低,但一个产品系统必须有log。这个是必须解决的问题。

另外,由于qmail的队列设计,每封处理的邮件至少要在队列中建立3个文件,因此write的i/o操作频繁很多(top的时候就看到kjounald占用的cpu资源明显高于postfix的测试环境),可能会造成一定bottleneck(尤其是log不是async写的时候更明显,通过简单的调整为非同步写的话就能提高很多速度,至少注入快了很多)

从2003年这个测试结果说明了如下的问题:
1.这个结果比较的只是smtp注入速度
2.这个结果主要是体现了若干MTA的I/O消耗及高流量下的并发处理能力
3.从结果可推断,postfix的fsync()操作最少,而qmail的fsync()较多,从体系结构及设计来说,qmail需要的是一个能立刻将数据同步(sync)到磁盘的文件系统,如果是使用了softupdates或async则在系统崩溃或掉电时,造成丢信。

也就是说,在bottleneck上,同样的文件系统及磁盘子系统,postfix使用更少资源,如果cpu足够快,能够忽略处理队列耗费cpu的时间的话,那么postfix的队列应该比qmail快而稳定。

最后,测试过程中qmail的队列出现过crash,而postfix从未发生过。

参考资料:
一个专门做测试的网站
Serverwatch
Postfix vs qmail

Posted by hzqbbc at 08:22 AM | Comments (1)

January 18, 2003

D. J. Bernstein 和Wietse Venema有关qmail的争论

两人高手过招,一个mta所需要做的工作不多,也不至于什么都干吧??限制smtp log大小是log的事.我觉得。当然,反过来看,mta如果能自己限制,也不错。类似qmail的配套工具一样。

以下是wietse的邮件,关于djb的写的slander回应.
http://groups.google.com/groups?q=artificial+limit+postfix+log+group:mailing.postfix.users&hl=zh-CN&lr=&ie=UTF-8&inlang=zh-CN&selm=9t130n%2412ha%241%40FreeBSD.csie.NCTU.edu.tw&rnum=1

DJB对venema的说法
http://cr.yp.to/qmail/venema.html

另外一个maillist上的邮件,包含了一个简单的patch
http://lists.q-linux.com/pipermail/plug-misc/2001-November/000963.html

Posted by hzqbbc at 07:18 PM | Comments (0)

在足够好的硬件条件下Postfix比qmail更快的原因分析

经过实际的测试我也发现Postfix比qmail快(在较平等的条件下比较)。究其原因,主要是因为磁盘I/O 的差异,Postfix的磁盘I/O原则上比qmail少耗用资源,仅1/3左右,所以速度原则上应该快3倍。

以下是wietse的解释。

以下是postfix作者的原话,就偶自己的小知识,对硬盘写操作越多性能越低。it's true..
===================================================================
寄件者:Wietse Venema (wietse@porcupine.org)
主旨:Re: performance tuning


View this article only
新闻群组:mailing.postfix.users
日期:2001-02-15 17:34:07 PST


jet:
> I'm trying to dump e-mails into postfix via smtp, but can only achieve
> speeds of about 15-20 mails/second (locally).
>
> I've tried spawning multiple outgoing-mail scripts, and tweaked with them,
> but always end up in the 15-20 mail/second range.

This is pretty much single disk Postfix performance. Most other
mailers are much, much, worse than this.

The bottle neck is your disk.

When I first tested Postfix against qmail, Postfix was 3x faster
on local and LAN benchmarks because Postfix uses one queue file
where qmail uses three. The create/delete operations are much,
much, more time consuming than reading and writing the content.

In order to make Postfix fast you need to avoid queue file
create/delete operations. Multiple recipients per queue file are
a big win. With one recipient per queue file Postfix is starved
because the disk is too slow.

Postfix does mostly random writes, and it actually has to wait
until a file is safe on disk. It's not allowed to wait until dirty
blocks are synced automatically.

In order to process more mails per unit time, you need faster disks.
Well, that is not possible.

You can, however, spread the load over multiple spindles. Don't
use RAID5 because it still has single-disk performance for applications
that produce random writes like Postfix does.

Another possibility is to use non-volatile RAM of the type that
used to be sold for SUN file servers. It speeds up disk access of
random writes by an order of magnitude, because writes can be sorted
for speed without loss of reliability. Next to solid-state disks
it is the best thing you can do.

Wietse

--

>My test machine is a:
> AMD Athlon Thunderbird 800 mHz
> 128 megs memory

not enough memory for sending 50K msgs/hour. You'll need 512 or more
to get 400+ SMTP processes into memory.

You will also have to increase maxfiles and maxfilesperprocess with sysctl.

I have a client with a listar announcement list machine hitting 40
msgs/hour on a P500 + 512 megs RAM and just one IDE disk. I think 50
is reachable, but 40 is more than enough for him.

> UDMA66 IDE drive

Better to have two IDE master drives on two separate IDE channels,
one for system + logging and the other for mailq, and maybe even a
third for mailbox storage, but that means master/slave disk on IDE
which is slower.

Of course, IDE is slower than SCSI, and SCSI cards with 64 or 128
megs on-board buffer would help the disk channel congestion, which
Wietse says is the limit to an MTA.

好象wietse说不要用raid5哦。。因为也是random write,这样的效能低。确实也是。因为如果是连续的有规律的写,速度会块得多的。。

Posted by hzqbbc at 01:50 PM | Comments (0)

Postfix+ldap的性能测试及分析

ChangeLog

·2005-04-16 进行内容整理
·2005-04-10 搜集到blog的Postfix专栏
·2003-01-18 第一版 发表于http://www.hzqbbc.com/forum/

1.测试平台
PIII 933 × 2 + 512M + SCSI 18G
Postfix 2.0.0.1 + openldap 2.0.11(默认安装) + virtual 投递

2.使用HASH格式作为查询表(lookukp table)时,SMTP 注入速度大概都有5000多封信/分钟。最高能达到87.5封/s。

这个时候采用的方式为只测试信头,100个并发连接。将postfix的默认进程限制提升到500,即:default_process_limit = 500,这也加大了并发速度。其中对syslogd采用了异步写优化(在设置的目录前加-号) 也能大幅度提高速度。

mail.*              -/var/log/maillog

3.将用户信息保存在ldap里,SMTP注入速度基本都有2500-2800/分钟,再简化了lookup的方式,transport_maps采用固定文本的方式,只有virtual_mailbox_maps才用ldap,这样速度提高到3200-3500封/分钟左右。测试recipients地址固定为10个

4.在ldap里存放50万条用户记录。进行500,1000,2000地址的并发测试,结果发现SMTP注入速度和使用的用户地址多少关系不大,以下是2000地址的结果:

Postal result:
time,msg/m,KB,error,connection,SSL 
10:26,3779,1516,2,1930,0 
10:27,3889,1556,0,1953,0 
10:28,3987,1592,0,1997,0 
10:29,4174,1667,0,2067,0 
10:30,4308,1720,0,2166,0 
10:31,4039,1613,0,1986,0 
10:32,2660,1061,0,1338,0 
10:33,2436,972,0,1216,0 

SMTP注入速度比较快,这里信件内容只包含了信头,采用50个并发连接,每个连接只发2-3封信。感觉效果还是不错的。不过10:32之后速度开始有点问题了,我检查了/var/spool/postfix目录,发现incoming的数量大幅提高,估计是cleanup不能那么快处理incoming的信件,使队列堆积起来;但也可能是ldap反应速度开始变慢。

小结:性能优化技巧

·关闭ldap的log使之=0
·将maillog等需要频繁进行write/sync的日志设置为async write,方法见上文
·调整postfix的default_process_limit > 150
·调整/var/spool/postfix目录的属性为async,使用chattr命令

5. 50个并发2000地址,邮件只包含信头,每个连接3封信的结果:

 
11:17,3613,1447,0,1849,0 
11:18,3876,1547,0,1930,0 
11:19,4088,1634,0,2080,0 
11:20,3920,1566,0,1977,0 
11:21,4057,1621,0,2021,0 
11:22,3946,1573,0,1960,0 
11:23,3316,1324,0,1695,0 
11:24,2576,1030,0,1295,0 
11:25,2889,1155,0,1439,0 
^^^^^^^^^^^^^^^^^^^^^^*
11:26,3984,1596,0,1990,0 
11:27,3986,1588,0,1993,0 
11:28,4164,1662,0,2064,0 
11:29,3973,1591,0,1986,0 
11:30,3991,1592,0,1982,0 
11:31,3521,1406,0,1758,0 
11:32,2590,1035,0,1297,0 
^^^^^^^^^^^^^^^^^^^^^^*
11:33,2655,1058,0,1322,0 
11:34,2745,1097,0,1348,0 

注意^^*所标记的时间段,SMTP注入性能大幅度下降。检查/var/spool/postfix目录,发现qmgr已达到处理上限了,该上限由qmgr_message_active_limit 参数设置。故此incoming开始增长很多,Postfix开始放慢处理速度。

以上所有测试都是在default_destination_concurrency_limit = 10的条件下完成的,所以在virtual进行本地mailbox投递时,并发数很少,而active目录增长得很快,没有足够多的virtual进行投递,结果是无法快速地将邮件写到硬盘。

6.提升default_destination_concurrency_limit:

default_destination_concurrency_limit = 1000

此时出现另外一个问题:active队列大概只维持在200-300百封信的样子,但defer/defered目录开始有所增长,并且SMTP注入速度下降,大概只有1850封信/分钟的速度。此时virtual的并发数目达到了400个!

可以推断,这是因为virtual投递速度和数量大幅度增加,致使磁盘写(write)操作过于频繁,使磁盘I/O性能吃紧,导致性能下降。

解决办法:使用分布式存储,利用mailswitch这一类技术,将最终的邮件投递由存储机器上的软件完成。

将这个default_destination_concurrency_limit改成100,速度提升了一些,但很多出现了很多lookup failure:

time,messages,data(K),errors,connections,SSL connections 
Server error:451 <user10798@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10700@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
11:57,2773,1124,2,1459,0 
11:58,2302,920,0,1156,0 
11:59,2492,996,0,1264,0 
12:00,2204,879,0,1087,0 
Server error:451 <user10380@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11876@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
12:01,2311,924,2,1146,0 
Server error:451 <user10559@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
12:02,2487,993,1,1265,0 
12:03,2284,912,0,1112,0 
12:04,1515,602,81,808,0 
              ^^ 81个错误
12:05,1969,787,11,1012,0 
12:06,2139,858,0,1073,0 
12:07,2174,868,0,1095,0 
12:08,2214,884,0,1093,0 
12:09,2252,900,0,1118,0 
Server error:451 <user11171@bigmail.hzqbbc.com>: Temporary lookup failure

并发降至50:

default_destination_concurrency_limit = 50

测试的log:

Server error:451 <user10062@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
12:21,2645,1064,2,1379,0 
12:22,3033,1211,0,1503,0 
12:23,2996,1206,0,1472,0 
Server error:451 <user10413@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
12:24,2911,1152,1,1455,0 
12:25,2716,1093,0,1359,0 
12:26,2844,1131,0,1434,0 
12:27,2748,1096,0,1360,0 
12:28,2738,1107,0,1396,0 
12:29,2652,1051,0,1352,0 
12:30,2759,1108,0,1355,0 
Server error:451 <user11773@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11684@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10859@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11567@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11039@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11336@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10781@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
12:31,2472,987,13,1253,0

Postal测试时打开100个并发,每个邮件最大10k,每个连接1封,发5000封,产生了非常多的failure,不过速度还不错。

接下来Postal 使用50并发,每并发1个邮件,打开了openLDAP cache支持:

time,messages,data(K),errors,connections,SSL connections 
23:03,3540,19196,46,3685,0 
23:04,4246,23136,0,4247,0 

23:05,4182,22873,19,4194,0 
                ^^19个错误 
23:06,3736,19922,74,3808,0 
                ^^ 70多个错误!
23:33,3683,3359,0,3733,0 
23:34,3935,3632,0,3935,0 
23:35,4041,3664,0,4041,0 
23:36,3990,3636,0,3989,0 
23:37,4164,3798,0,4165,0 
23:38,3872,3512,0,3872,0 
23:39,3210,2945,0,3209,0 
23:40,2624,2400,0,2625,0 
Server error:451 <user10717@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10614@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10634@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
23:41,2502,2286,3,2505,0 
23:42,2760,2508,0,2759,0 
Server error:451 <user11449@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11660@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
23:43,2528,2305,2,2531,0 
23:44,2078,1972,0,2078,0 
Server error:451 <user10298@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10706@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11515@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10499@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
23:45,1520,1387,4,1524,0 
23:46,1715,1536,0,1715,0 
23:47,1475,1362,0,1475,0 
23:48,1508,1371,0,1507,0 

可见active目录的上限已经达到了,所以SMTP注入速度开始下降,邮件已经开始处理不过来了。

调整Postal 参数,使用50并发,每个并发1个msg,尽最大能力发送,打开openLDAP cache模式,qmgr_messages_active_limit=50000,测试持续了近一个小时,日志如下:

[root@ns1 postal-0.61]# ./do.sh 
time,messages,data(K),errors,connections,SSL connections 
Server error:451 <user11017@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10714@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10989@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
23:52,3663,3353,3,3714,0 
23:53,3784,3454,0,3786,0 
23:54,3747,3409,0,3747,0 
23:55,3861,3498,0,3861,0 
23:56,3922,3577,0,3922,0 
23:57,3822,3491,0,3822,0 
23:58,3748,3429,0,3747,0 
23:59,3608,3264,0,3609,0 
00:00,3320,3027,0,3319,0 
00:01,3410,3117,0,3411,0 
00:02,3317,3042,0,3316,0 
00:03,3580,3286,0,3581,0 
00:04,3507,3205,0,3507,0 
00:05,3475,3114,0,3475,0 
00:06,3259,2995,0,3258,0 
00:07,3186,2912,0,3186,0 
00:08,3552,3216,0,3553,0 
00:09,2887,2635,0,2887,0 
00:10,2517,2292,0,2517,0 
00:11,2721,2505,0,2721,0 
00:12,2574,2317,0,2574,0 
00:13,2587,2357,0,2587,0 
00:14,2573,2342,0,2573,0 
00:15,2704,2463,0,2704,0 
00:16,2647,2405,0,2647,0 
00:17,2585,2383,0,2585,0 
00:18,2421,2235,0,2421,0 
00:19,2573,2349,0,2572,0 
00:20,2378,2170,0,2378,0 
Server error:451 <user11351@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11458@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
...... 期间Temporary lookup failure有12个
00:21,2330,2144,15,2346,0 
00:22,2610,2376,0,2610,0 
00:23,2551,2317,0,2550,0 
00:24,2586,2380,0,2587,0 
00:25,2530,2282,0,2528,0 
00:26,2558,2306,0,2560,0 
00:27,2444,2232,0,2444,0 
00:28,2509,2250,0,2508,0 
00:29,2523,2293,0,2524,0 
00:30,2445,2210,0,2445,0 
00:31,2555,2345,0,2554,0 
00:32,2455,2225,0,2456,0 
00:33,2477,2266,0,2476,0 
Server error:451 <user10490@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10822@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
00:34,2236,2038,2,2238,0 
00:35,2245,2061,0,2245,0 
00:36,2536,2311,0,2537,0 
00:37,2210,2004,0,2210,0 
00:38,2385,2150,0,2385,0 
00:39,2463,2242,0,2463,0 
00:40,2496,2291,0,2495,0 
00:41,2471,2261,0,2471,0 
00:42,2422,2208,0,2422,0 
00:43,2370,2159,0,2371,0 
00:44,2387,2197,0,2386,0 
00:45,2203,2028,0,2203,0 
00:46,2494,2301,0,2494,0 
00:47,2313,2133,0,2314,0 
00:48,2255,2016,0,2255,0 
00:49,2428,2236,0,2428,0 
00:50,2327,2113,0,2326,0 
00:51,2436,2244,0,2437,0 
00:52,2334,2127,0,2334,0 
00:53,2399,2202,0,2398,0 
00:54,2271,2078,0,2272,0 
00:55,2255,2046,0,2255,0 
00:56,2397,2205,0,2397,0 
00:57,2392,2183,0,2392,0 
00:58,2364,2161,0,2364,0 
00:59,2241,2031,0,2240,0 
01:00,2354,2137,0,2355,0 
01:01,2223,2000,0,2223,0 
01:02,2326,2118,0,2325,0 
01:03,2343,2155,0,2344,0 
01:04,2388,2209,0,2388,0 

队列里积压了13万封信,70多分钟内注入了22万封信,正确投递并写入用户$HOME目录的有95000多封,由于人为限制了投递的并发数,因此这个结果也在意料之内了。

[root@ns2 postfix]# postsuper -d ALL 
postsuper: Deleted: 137410 messages 
[root@ns2 postfix]# du -sk /home/domains/bigmail.hzqbbc.com/ 
392124  /home/domains/bigmail.hzqbbc.com 
[root@ns2 postfix]# find /home/domains/bigmail.hzqbbc.com/ | wc -l 
 95349 

6. 新一轮测试

Postal 使用50/100个并发,1-3封邮件/每连接,将Postfix升级到Snapshot 2.0.2,并打开proxymap daemon,做如下的测试:

 
[root@ns1-bjcnc postal-0.61]# ./do.sh   
time,messages,data(K),errors,connections,SSL connections 
Server error:451 <user11097@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10170@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11333@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
14:02,3743,3366,3,1880,0 
14:03,4244,3809,0,2129,0 
14:04,4345,3915,0,2204,0 
14:05,4187,3790,0,2114,0 
[root@ns1-bjcnc postal-0.61]# ./do.sh 
time,messages,data(K),errors,connections,SSL connections 
14:10,4380,3973,5,2289,0 
14:11,4481,4065,0,2215,0 
14:12,4541,4083,0,2260,0 
. 
14:13,2741,2455,14,1389,0 
14:14,3927,3549,0,1949,0 
14:15,3611,3235,0,1807,0 
14:16,3634,3304,0,1796,0 
14:17,3889,3475,0,1933,0 
[root@ns1-bjcnc postal-0.61]# ./do.sh 
time,messages,data(K),errors,connections,SSL connections 
14:22,1838,1644,0,961,0 
Server error:451 <user11857@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11434@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
14:23,1910,1721,2,959,0 
Server error:451 <user10268@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11688@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
14:24,1989,1769,2,1007,0 
Server error:451 <user11264@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11066@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
14:25,1828,1652,2,915,0 
[root@ns1-bjcnc postal-0.61]# ./do.sh 
Can't open config file "conver".  Doing no expansion. 
time,messages,data(K),errors,connections,SSL connections 
Server error:451 <user11182@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
14:27,4110,3721,1,2105,0 
14:28,289,221,0,149,0 

[root@ns1-bjcnc postal-0.61]# cat do.sh

#!/bin/sh 
MAXMSGSIZE=1 
PROC=50 
MSGPERCONN=3 
MSGPERMIN=5000 
SSLPERCENT=0 
SMTP_HOST=210.82.193.91 
./postal -m $MAXMSGSIZE -p $PROC -c $MSGPERCONN -r $MSGPERMIN -a \
  -b netscape $SMTP_HOST myuser conver 
[root@ns1-bjcnc postal-0.61]# ./do.sh 
time,messages,data(K),errors,connections,SSL connections 
. 
14:38,4213,3793,10,2176,0 
14:39,4308,3869,0,2174,0 
14:40,4296,3835,0,2160,0 
14:41,4261,3850,0,2136,0 
14:42,4342,3930,0,2168,0 
14:43,4258,3843,0,2139,0 
14:44,4172,3769,0,2067,0 
14:45,4111,3684,0,2052,0 
14:46,3920,3577,0,1961,0 
14:47,3987,3590,0,1987,0 
14:48,3729,3365,0,1851,0 
14:49,3702,3318,0,1842,0 
14:50,3745,3371,0,1879,0 
14:51,3961,3558,0,1969,0 
14:52,3753,3415,0,1876,0 
14:53,3498,3139,0,1781,0 
14:54,3337,3001,0,1661,0 
14:55,2671,2398,0,1341,0 
14:56,2722,2453,0,1365,0 
14:57,2648,2388,0,1342,0 

14:58,2466,2199,7,1227,0 
14:59,2601,2344,0,1299,0

通过上述多次测试表明,在打开proxymap daemon后,SMTP注入速度明显提高,结果是提升了10%-15%。

接下来将virtual投递并发限制
[root@ns1-bjcnc postal-0.61]# ./do.sh
time,messages,data(K),errors,connections,SSL connections
20:44,4289,3902,0,2166,0
20:45,4329,3874,0,2169,0
Server error:451 : Temporary lookup failure
.
期间产生12个Temporary lookup failure
20:46,3993,3594,13,1983,0
20:47,4422,4018,0,2216,0
20:48,4639,4163,0,2323,0

以上部分是final_destination 限制在20个并发virtual
========================================
接着改成10个,速度提升。说明bottleneck仍然在disk i/o(投递多了就明显影响i/o)ldap仍然有潜力可挖掘。
[root@ns1-bjcnc postal-0.61]# ./do.sh
Can't open config file "conver". Doing no expansion.
time,messages,data(K),errors,connections,SSL connections
20:53,5172,4690,0,2636,0
20:54,5074,4530,0,2529,0
20:55,4988,4529,0,2494,0
20:56,4856,4376,0,2456,0
20:57,5114,4602,0,2530,0
20:58,4720,4251,0,2345,0
20:59,4490,4038,0,2230,0
21:00,4716,4234,0,2363,0
21:01,4355,3931,0,2188,0
21:02,4488,4046,0,2241,0
21:03,4495,4062,0,2271,0
21:04,4680,4224,0,2363,0
21:05,4295,3882,0,2141,0
21:06,3822,3436,0,1921,0
21:07,2899,2597,0,1425,0
21:08,2863,2587,0,1447,0

完整结果如下:
[root@ns1-bjcnc postal-0.61]# ./do.sh
Can't open config file "conver". Doing no expansion.
time,messages,data(K),errors,connections,SSL connections
20:53,5172,4690,0,2636,0
20:54,5074,4530,0,2529,0
20:55,4988,4529,0,2494,0
20:56,4856,4376,0,2456,0
20:57,5114,4602,0,2530,0
20:58,4720,4251,0,2345,0
20:59,4490,4038,0,2230,0
21:00,4716,4234,0,2363,0
21:01,4355,3931,0,2188,0
21:02,4488,4046,0,2241,0
21:03,4495,4062,0,2271,0
21:04,4680,4224,0,2363,0
21:05,4295,3882,0,2141,0
21:06,3822,3436,0,1921,0
21:07,2899,2597,0,1425,0
21:08,2863,2587,0,1447,0
21:09,2887,2600,0,1473,0
21:10,2772,2494,0,1372,0
21:11,2106,1896,0,1033,0
21:12,1188,1079,0,584,0
21:13,1493,1355,0,742,0
21:14,1696,1540,0,828,0
21:15,1860,1677,0,920,0
21:16,1581,1428,0,784,0
21:17,2022,1837,0,1018,0
21:18,1477,1317,0,735,0
21:19,1511,1375,0,760,0
21:20,1877,1691,0,950,0
21:21,1459,1314,0,725,0
21:22,1494,1321,0,740,0
21:23,1526,1386,0,756,0
21:24,1309,1188,0,643,0
21:25,1446,1288,0,751,0
21:26,1396,1252,0,722,0
21:27,1431,1260,0,713,0
21:28,1312,1199,0,663,0
21:29,1290,1149,0,658,0
21:30,1190,1078,0,608,0
21:31,949,864,0,464,0
21:32,969,875,0,484,0
21:33,1210,1095,0,625,0
21:34,1384,1255,0,681,0
21:35,1332,1207,0,645,0
21:36,1018,928,0,503,0
21:37,882,795,0,442,0
21:38,1222,1095,0,613,0
21:39,1171,1051,0,581,0
21:40,1222,1123,0,609,0
21:41,1223,1088,0,606,0
21:42,1220,1086,0,607,0
21:43,1328,1208,0,665,0
21:44,1409,1263,0,719,0
21:45,1488,1349,0,723,0
21:46,1237,1115,0,628,0
21:47,1160,1035,0,586,0
21:48,1270,1146,0,626,0
21:49,1400,1261,0,708,0

到了21:10开始,actvie已接近上限,处理速度开始来不及了。此外,postal里限制了一分钟只发5000封信,所以本次测试的峰值51xx信/分未必可信,应能提高更多。

提高注入速度的新测试

笔者根据在IBM developerWorks上得到的灵感,将Postfix queue的所在的磁盘分区挂接参数(mount)改为noatime,这样实际减少了磁盘的写入操作(noatime表示对于访问操作,不更新文件的atime记录,这可减少磁盘的write操作),提高了5%的性能。

通过elvtune调整了磁盘write操作的延时,可明显提高速度。

注意使用linux的iostat及相关的I/O监视工具。。以下是调整后的数值:

time,messages,data(K),errors,connections,SSL connections 
23:57,5325,4804,0,2703,0 
23:58,5480,4905,0,2702,0 
23:59,5160,4624,0,2573,0 
00:00,5211,4689,0,2609,0 
00:01,4918,4441,0,2453,0 
00:02,4872,4377,0,2395,0 
00:03,5039,4564,0,2492,0 
00:04,4694,4223,0,2308,0 

效果有了明显进展。而再进行多次测试发现,有些时段可达到峰值63xx封/分,估计这因为是Postfix的Multiplex技术及磁盘I/O的cache所带来的收益,multiplex可复用已打开的smtp通道,减少了进程的创建带来的开销。

[root@ns1-bjcnc postal-0.61]# ./do.sh 
time,messages,data(K),errors,connections,SSL connections 
22:23,6330,5721,0,3213,0 
22:24,5592,5038,0,2783,0 
22:25,5069,4555,0,2509,0 
22:26,4624,4203,0,2333,0 
22:27,4778,4311,0,2438,0 
22:28,4396,3943,0,2215,0 
[root@ns1-bjcnc postal-0.61]# ./do.sh 
Can't open config file "conver".  Doing no expansion. 
time,messages,data(K),errors,connections,SSL connections 
22:32,5121,4615,0,2611,0 
22:33,4696,4263,0,2352,0 
22:34,4532,4094,0,2277,0 
22:35,4622,4138,0,2297,0 
22:36,5006,4515,0,2493,0 
22:37,4487,4052,0,2284,0 
22:38,4595,4121,0,2300,0 
22:39,4669,4221,0,2300,0 
22:40,4381,3961,0,2149,0 
22:41,4217,3781,0,2100,0 
[root@ns1-bjcnc postal-0.61]# ./do.sh 
time,messages,data(K),errors,connections,SSL connections 
22:44,5148,4642,0,2629,0 
22:45,4868,4390,0,2414,0 
22:46,4890,4392,0,2451,0 
22:47,4803,4306,0,2419,0 
22:48,5005,4552,0,2493,0 
22:49,4435,3976,0,2221,0 
22:50,4515,4055,0,2246,0 
22:51,4666,4187,0,2336,0 
22:52,4556,4105,0,2283,0 
22:53,4472,3979,0,2239,0 
22:54,4136,3749,0,2057,0 
22:55,4649,4172,0,2355,0 
22:56,4304,3895,0,2142,0 
22:57,4055,3607,0,2008,0 
22:58,2692,2425,0,1335,0 
22:59,2690,2402,0,1346,0 
23:00,2796,2533,0,1379,0 
23:01,2738,2448,0,1396,0 
23:02,2522,2287,0,1252,0 
23:03,2574,2337,0,1286,0 
[root@ns1-bjcnc postal-0.61]# ./do.sh 
time,messages,data(K),errors,connections,SSL connections 
23:08,6084,5525,0,3068,0 
23:09,5834,5254,0,2912,0 
23:10,4683,4205,0,2303,0 
23:11,4859,4376,0,2453,0 
23:12,4265,3814,0,2142,0 
23:13,4595,4179,0,2282,0 
23:14,4466,4002,0,2217,0 
23:15,4438,3982,0,2201,0 
23:16,4463,4033,0,2242,0 
23:17,4592,4123,0,2327,0 
23:18,4234,3802,0,2097,0 
23:19,4545,4049,0,2265,0 
23:20,3853,3437,0,1925,0 
23:21,2708,2417,0,1349,0 
23:22,2730,2452,0,1363,0 
[root@ns1-bjcnc postal-0.61]# ./do.sh 
Can't open config file "conver".  Doing no expansion. 
time,messages,data(K),errors,connections,SSL connections 
23:42,5327,4852,0,2721,0 
23:43,5201,4653,0,2631,0 
23:44,5377,4815,0,2705,0 
23:45,4899,4431,0,2398,0 
23:46,4863,4378,0,2421,0 
23:47,4885,4382,0,2460,0 
23:48,5012,4523,0,2496,0 
23:49,5038,4501,0,2561,0 
23:50,4898,4427,0,2431,0 
23:51,4888,4400,0,2494,0 
23:52,4856,4350,0,2398,0 
Server error:451 <user10945@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11208@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user10766@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
Server error:451 <user11461@bigmail.hzqbbc.com>: Temporary lookup failure 
. 
23:53,4584,4115,4,2306,0 
23:54,4693,4243,0,2336,0 

接下来,将保存对列的磁盘sda3进行调整:

(queue) r_d w_d from 4096/8192 -> 4096/2048

并清理队列,重新测试,注入速度快了不少。

没调整磁盘R/W延迟之前的结果(r/w -> 8192/4096)

time,messages,data(K),errors,connections,SSL connections 
00:08,5280,4790,0,2704,0 
00:09,5135,4570,0,2546,0 
00:10,5280,4810,0,2635,0 
00:11,5082,4572,0,2547,0 
00:12,5195,4648,0,2596,0 
00:13,5053,4591,0,2549,0 
00:14,5089,4568,0,2583,0 
00:15,4937,4438,0,2485,0 
00:16,4727,4312,0,2340,0 
00:17,4396,3958,0,2224,0 
00:18,4860,4353,0,2413,0 
00:19,4451,3994,0,2227,0 
00:20,4582,4132,0,2300,0 
00:21,2809,2486,0,1439,0 

调整为r/w -> 4096/2048,结果连续出现了6次6xxx/分:

[root@ns1-bjcnc postal-0.61]# ./do.sh 
time,messages,data(K),errors,connections,SSL connections 
00:23,6247,5597,0,3174,0 
00:24,6581,5891,0,3280,0 
00:25,6544,5898,0,3236,0 
00:26,6457,5737,0,3235,0 
00:27,6283,5690,0,3137,0

最后结论

LDAP的速度不容质疑,在硬件足够好的前提下,系统的瓶颈一般在磁盘的I/O上,LDAP一般较磁盘I/O要快。而如何调整Postfix的注入并发数、系统总进程数、投递的并发数等,需要大量的测试,找到在具体平台、具体硬件下所能达到的最高效能,继而定出合理的数值。

作为一个繁忙的邮件系统,磁盘的I/O能力,尤其是队列所在的磁盘I/O能力要求非常高。在不增加硬件投入的前提下,善于利用iostat, vmstat及相关I/O分析工具,可以找到磁盘的限制在哪里,利用elvtune调整磁盘的读/写时延(delay)来减少这种限制所带来的性能劣化。

如果有足够的经济实力,可以使用带NVRAM的磁盘系统,使用Ultra 320的磁盘,甚至组成RAID0+1的系统来构成队列。这样可有效的提高速度。有报道称,国外有公司使用Sun的存储设备,在Solaris下实现了sendmail 287封邮件/秒 的处理速度。换言之相当于17220封/分的速度。这得益于它们使用了Sun的高级存储设备,配备了NVRAM的Cache,非常高速的硬盘,相当好的性能调整和优化。

通过这个国外的实例可以说明一点,Sendmail其实性能也可以相当好,关键是看系统的设计和部署人员的能力和经验。

而本文所实现的结果,也反映了Postfix在低端硬件(U160 SCSI磁盘,PIII 933的普通CPU)下效能相当不俗。如果按正常的4000封/分钟的处理速度,那么一天相当于处理576万封普通大小的电子邮件。

原文地址:http://www.hzqbbc.com/forum/read.php?forumid=2&filename=f_73

Posted by hzqbbc at 10:56 AM | Comments (0)

January 01, 2003

配置per user transport实现用户分布存储

高级Postfix配置

大容量邮件系统的关键之一就在于存储,而存储无外乎集中存储和分布存储。

O 集中存储主要是通过光纤磁盘阵列,SAN或iSCSI等统一存储技术,将邮件数据集中或较为集中的存储在一个地方,应用系统共享使用这些数据,所带来的问题主要是如何解决同时请求一个文件的缩定问题

O 分布存储则将用户数据分布在多个存储设备/机器上,主要的手段是通过一定的算法,将邮件分发到存储的设备或主机上。原理上有点象路由器,邮件被路由到了最终的目的地。所带来的问题是管理比较麻烦,但较少共享缩定的问题。

如何利用Postfix实现分布的用户数据存储呢?这里通过per user transport来实现。

在postfix没有release 2.0.0之前这个per user transport的功能只出现在snapshot里,现在好了可以在正式版里使用了。本文所描述的方法适合于:postfix stable > 2.0.0.0或snapshot version > 1.1.11-200205xx

1.描述
什么是per user transport?有什么用?简单来说就是可以为每个用户设置单独的transport,类似原来1.1.x里的domain transport一样,只不过现在的key可以是user+extension@domain.tld了。好处是什么?好处就是可针对不同用户设置特定的transport,例如有些用户使用默认的local transport,而有些则可以使用病毒过滤或maildrop功能,甚至将用户送到nexthop继续进行处理等。可能的应用:病毒过滤、关键字过滤、垃圾邮件过滤、邮件的物理分布或转发等。

自己的感觉:这个功能就类似qmail里的qmqp(中央队列服务器<-->qmtpd/qmqpd机器)或者类似qmail-ldap里的cluster功能。而且要强大得多!!qmail的qmqp/qmtp协议设计得很好,思路也很不错,不过具体的implemenation就不够好。使用postfix可享受最大的灵活性。

如果postfix能考虑完整的实现qmail的qmqp/qmtp等不错的特性,并吸取qmail的优点话,那么qmail的价值则更加大打折扣了。

2.配置思路

#
#                                    +------+
#          local                     |      |
#          handling                  | MTA1 |
#          layer                     |      | 
#                                    +------+
#                                        ^ foo@lvs.hzqbbc.com
#                                        | 
#                                    +--------+        +------+ 
#                    +------+        |        |        |      | 
#           inbound  |mail  |  ----->|switch  |----->  | MTA2 | bar@lvs.hzqbbc.com
#          email     +------+        |        |        |      | 
#                                    +--------+        +------+
#
#                                     邮件交换         local handling layer 

将不同用户的邮件switch到不同的机器,达到分布的目的。配置方法见下文。

3.配置方法:
在mail switch机器上:
postconf -n

alias_database = hash:/etc/postfix/aliases
alias_maps = hash:/etc/postfix/aliases
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/lib/postfix
debug_peer_level = 2
inet_interfaces = all
local_recipient_maps = unix:passwd.byname $alias_maps $virtual_mailbox_maps
mail_owner = postfix
mailq_path = /usr/bin/mailq.postfix
manpage_directory = /usr/share/man
mydestination = $myhostname,$mydomain,$transport_maps
mydomain = LVS.hzqbbc.com
myhostname = LVS.hzqbbc.com
mynetworks = 192.168.0.0/16, 127.0.0.0/8
myorigin = $mydomain
newaliases_path = /usr/bin/newaliases.postfix
queue_directory = /var/spool/postfix
readme_directory = /etc/postfix/README_FILES
sample_directory = /etc/postfix/samples
sendmail_path = /usr/sbin/sendmail.postfix
setgid_group = postdrop
smtpd_banner = $myhostname ESMTP $mail_name Mail-Switch ($mail_version)
smtpd_recipient_restrictions = check_relay_domains,reject
transport_maps = hash:/mnt/disk/vmail/maps/transport
unknown_local_recipient_reject_code = 450 

在MTA1和MTA2(这里我定义的机器名字是srv1, srv2)上配置完全一样,除了机器名不同,因此只列出一个srv2的example:
postconf -n

alias_database = hash:/etc/postfix/aliases
alias_maps = hash:/etc/postfix/aliases
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/lib/postfix
debug_peer_level = 2
inet_interfaces = all
local_recipient_maps = unix:passwd.byname $alias_maps $virtual_mailbox_maps
local_transport = virtual
mail_owner = postfix
mailq_path = /usr/bin/mailq.postfix
manpage_directory = /usr/share/man
mydestination = $myhostname,$mydomain, $virtual_mailbox_domains
mydomain = LVS.hzqbbc.com
myhostname = SRV2.LVS.hzqbbc.com
mynetworks = 192.168.0.0/16, 127.0.0.0/8
myorigin = $mydomain
newaliases_path = /usr/bin/newaliases.postfix
queue_directory = /var/spool/postfix
readme_directory = /etc/postfix/README_FILES
sample_directory = /etc/postfix/samples
sendmail_path = /usr/sbin/sendmail.postfix
setgid_group = postdrop
smtpd_banner = $myhostname ESMTP $mail_name ($mail_version)
transport_maps = hash:/mnt/disk/vmail/maps/vuser_transport
unknown_local_recipient_reject_code = 450
virtual_gid_maps = static:250
virtual_mailbox_base = /mnt/disk/vmail
virtual_mailbox_domains = lvs.hzqbbc.com
virtual_mailbox_maps = hash:/mnt/disk/vmail/maps/mailbox
virtual_uid_maps = static:250 

之后就是配置了。
/mnt/disk/vmail/maps下有这些文件:
cat mailbox

LVS.hzqbbc.com      anything
hzq@LVS.hzqbbc.com  hzq/Maildir/
ben@LVS.hzqbbc.com  ben/Maildir/
foo@LVS.hzqbbc.com  foo/Maildir/
bar@LVS.hzqbbc.com  bar/Maildir/ 

cat transport

foo@LVS.hzqbbc.com  smtp:[SRV1.LVS.hzqbbc.com]
bar@LVS.hzqbbc.com  smtp:[SRV2.LVS.hzqbbc.com]

cat vuser_transport

foo@LVS.hzqbbc.com  virtual:
bar@LVS.hzqbbc.com  virtual: 

这样,对外部用户只有一个postfix mail-switch/hub。投递给该switch的邮件会转到合适的后端机器(srv1 or srv2)并进行本地投递,整个系统扩容时也特别简单,增加1-n个switch的机器,并增加相应mx记录即可。负载可以比较平均,而不需要使用LVS设备

后端扩充用户及存储也不难,不断增加srvn(n>2)就可以了。这样的体系缺点是效率不够高,耗费机器比较多,但TOC相当低,不需要太多的开发,直接利用postfix的功能并进行合理扩充即可。

Posted by hzqbbc at 09:27 PM | Comments (0)