Wednesday, December 19, 2007

Running Hundreds Of Evolution Users

Over the last few months, we have had an increase in number of concurrent Evolution users, and also have had an increase in calendar and email usage. Evolution makes heavy use of cache when interacting with GroupWise and we were starting to have disk IO performance problems from 8am until about 10:00am. This is the heaviest load, when everyone is downloading heavily from the post office. The disk drives were getting so busy that the UI was starting to slow and performance was not acceptable. Interestingly, CPU usage was under 10% even with 250 people.

We had just ordered our annual allotment of servers. We use a trickle approach and move servers around based on changing requirements. Three new HP DL580G5s arrived . The older DL580s used 15K 3.5" Ultra320 SCSI 72GB drives. The new server uses 10K 2.5" SAS 146GB drives. Below are the specs, the difference in speed is amazing. We also had previously used RAID 5 which might have contributed to the bottleneck. We configured the new server and used RAID 1+0. A backup and restore of /home and IP number change the new server immediately went live with only a 15 minute interruption. The new server is so fast that when you hit [ Send/Receive] the dialog flashes and disappears so quickly it can barely be seen.

Disk performance is excellent even with 250 people:

iostat -x 10

Linux 2.6.16.54-0.2.3-bigsmp (oa3) 12/20/07
avg-cpu: %user %nice %system %iowait %steal %idle
0.97 0.00 0.24 0.01 0.00 98.81

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 71.90 0.00 22.30 0.00 1507.20 0.00 753.60 67.59 0.01 0.43 0.25 0.56


Previously, the drives were %util from 75-100%

Technical Specifications:

  • Hard Drive Capacity : 72.8GB
  • Generation: Ultra320
  • Data Transfer Rate: 320 MB/sec
  • Rotational Speed: 15,000 rpm
  • Form Factor (Drive): 3.5-inch low profile
  • Interface: Wide Ultra 320 SCSI; LVD
  • Data Storage Device Type: SAS (Server Attached Storage) device
  • Hard Drive Device Type: Hard drive for server/storage unit (Hot-plug)
  • Height: 1 inch
  • Pin Configuration: 80 pin Hot Swappable/ Pluggable
  • Hotswap Tray: Included (Attached)
Technical Specifications:
  • Hard Drive Capacity : 146GB
  • Generation: SAS
  • External Data Transfer Rate: 3.0 GB/sec
  • Rotational Speed: 10,000 rpm
  • Form Factor (Drive): 2.5-inch low profile
  • Interface: SAS (Serial Attatched SCSI)
  • Data Storage Device Type: SAS (Server Attached Storage) device
  • Hard Drive Device Type: Hard drive for server/storage unit (Hot-plug)
  • Height: 0.591"
  • Width: 2.75"
  • Hotswap Tray: Included (Attached)

8 comments:

Anonymous said...

Do you have the iostats from the old SCSI320 drives for comparison?

Just curious.

Anonymous said...

Wow, what a beast. I always enjoy reading about the what the big dogs are using and running, thanks!

Dave Richards said...

I didn't keep the iostat from the old machine. The %util was pegged close to 100% all morning. Now it's flying all day, very cool technology.

Anonymous said...

Maybe old filesystem was so fragmented that it caused disk-io to crawl?

Anonymous said...

"Below are the specs, the difference in speed is amazing. "
Actually the specs are quite misleading; you compared 320 MB/s with 3GB/s, but those are actually 3Gb/s, so the difference is not that big and also that's the interface speed not hard drive's.

I found some tests performed by HP between SAS and SCSI drives that shows that SAS drives are slower than 15k SCSI (but faster than 10k SCSI though). I think that most of the performance improvement was from switching from RAID 5 to RAID 1+0. You could try switching an "old" server to RAID 1+0 and see how it goes :-D

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00502620/c00502620.pdf

For example transfers/s RAID 1+0 tests starting at page 17 shows that in RAID1+0 config: SAS - 628 tps; 15K SCSI - 757 tps; and in RAID 5: SAS - 496 tps; 15k SCSI 565 tps.

However, the write latency of SAS is much lower than 15k SCSI but I don't know if this has such a big impact.


George

Dave Richards said...

George:
Excellent feedback on the specs. I copied and pasted the specs so it must have been a typo on the web. So it sounds more like the RAID level was the influence in this speedup. To me the drives "felt" faster even when configured on the same RAID level, but possibly that is because of server improvements made in the last few years.

Olaf van der Spek said...

SAS is Serial Attached SCSI
DAS is Direct Attached Storage, as opposed to NAS or SAN.

Anonymous said...

In addition to the RAID level, the RAID card and the amount of cache on the card have a huge effect, especially on workloads where there is a lot of concurrent access.

Also: how many drives were there before and after? Extra spindles can help access speeds, too.