Author:
Joseph L. Kaiser
FNAL:CD-OSS/SCS
jlkaiser@fnal.gov
630-840-6444
This is the CMS 3WARE IDE Experience Document
It describes the experience with the 3Ware server environment as set up for CMS.
The 3Ware servers that Fermilab purchased from Integral Corp. for testing came with the following hardware:
Three 3Ware 6800 8-point IDE cards
21 80GB IDE Maxtor hard drives
Two pentium III 800
512 MG ECC Memory
23 Kingston Disk Sleds
One American ProImage Case (10 Fans)
Two 400 Watt Hot swap powersupply
Two - 8 GB system disks
Acenic Gigabit Ethernet
BIOS Firmware
THINGS TO NOTE:
http://www.linuxdoc.org/HOWTO/Boot+Root+Raid+LILO-4.html
The files used for the setup are /etc/raidtab.old and /etc/lilo.conf.hda.old and /etc/lilo.conf.hdc.old. These are in the current /etc directory.
3. The installation of the firmware and OS can be viewed in the 3Ware-Howto.txt document.
4. The /etc/ directory and everything that had been in the /root directory with the exception of the /root/integral directory were saved. (It turns out that all the modifications needed to make the 3Ware work in the 2.2.18 kernel were put in a /root/integral directory. After looking at this, it was determined that these would not be applicable in the installation of the latest version of Fermi Red Hat Linux as the new kernel was 2.2.19 and it had the 3Ware modules in it.) These files are saved and stored as rimbaud.etc.tar.gz and rimbaud.tar.gz in my AFS space in the ~/jlkaiser/CMS/servers/rimbaud directory. This directory also contains all the software tests that I ran after the initial reconfiguration. These tests were duplicates of the ones that were run by the sysadmin who had set up this server in the first place. The results and scripts for the tests done after the multiple reconfigurations for rimbaud are also in the above AFS directory under the "tests" directory. There some other critical files in this directory, like the kickstart file used to install the CMSserver workgroup, the bonnie++ rpms and the firmware upgrades from 3Ware.
I will go through the various configuration cycles of this particular machine.
CONFIGURATION CYCLE 1:
The goal in this cycle was to establish that the hardware performed as expected and that it performed just as well after a reinstall of the OS. After changing the hostname to rimbaud (French poet - if you don't know, it don't matter.), disk tests were performed with bonnie++ that were the same as the tests performed when the machine was first put together. The results of these tests are in the "tests" directory and are named datatests.<somedate> with <somedate> being prior to April 21.
When the machine came to CMS, it had a 2.2.18 kernel with
modifications from Integral. The 2.2.19 kernel was available in Fermi Red
Hat Linux so this was installed and the previous OS wiped out. This had
several effects:
1. Totally blew away all previous kernel configuration done by Integral.
2. Totally blew away the /root/integral directory that was supposed to
have all the information on the patches and configuration information on
it. This information proves important later.
After tests were performed to achieve a baseline, the machine was reinstalled.
There were two main changes in the configuration of the machine:
1. The two systems disks that were mirrored with software raid were taken
out of this configuration and were not put back into it after the
reinstallation.
2. The OS was reinstalled.
3. No changes were made to the RAID arrays.
In the original configuration, the 3Ware driver was patched and configured into a recompiled 2.2.18 kernel. Since the driver had been added as a module to the official 2.2.19 kernel release, the driver was loaded at as a run-time module for CMS. It turns out, however, that the "/sbin/insmod 3w-xxxx" line has to be added to the rc.sysinit just after the line activating the swap partions in order for the drives to be recogzined. Why this is the case is unclear, but this instruction was in the 3Ware documentation.
With the OS installed, testing recommenced. Tests were conducted after April 23 so datatests.<somedate> reflect the tests performed after this date. Tests remained the same. Thus ended configuration cycle 1.
CONFIGURATION CYCLE 2:
Shortly after the reinstallation of the OS, CMS decided that they needed the server to have RAID5 striping. Since this had just been added to the firmware by 3Ware, firmware and driver updates were downloaded from their website and updated. There is a BIOS utility to configure the disk arrays. This information is detailed in the 3Ware-Howto.txt document.
The RAID5 configuration brings horrible write performance only about 3-4MB/s on a good day. This can be seen in datatests.051001. This, however, was noticed later. Approximately two weeks after switching this to RAID5, severe file corruption was noticed by SDSS on their 3Ware server when using RAID5 striping. This was also the case with servers at CERN. 3Ware released updated firmware that no longer allowed RAID5 striping. Once again firmware and drivers were downloaded from 3Ware, and the arrays were set back to RAID0 (the original configuration).
CONFIGURATION CYCLE 3:
Since the arrays needed to be set back to RAID0 and a CMSserver
workgroup definition in FHRL had been created, the reconfiguration
of rimbaud was used as a test bed for the CMSserver definition. This
took a couple weeks of on and off work to try and tweak the configuration
to what was needed for a server by CMS standards.
After doing this, formatting the arrays, and making a filesystem on them, some discrepancies arose. The BIOS showed 635GB for the two eight disk arrays. A "df -h" command would show only 496GB. This is after several attempts at repartioning and changing drive geometry with fdisk. The problem seemed intractable.
At this time CMS was having difficulty with their prodution and needed rimbaud's resources. It was given to them with the caveat that a break in production would allow time for fixing the disk problem.
CONFIGURATION CYCLE 4:
CMS finished with the server and the opportunity presented itself for resolving the disk size issue. The search for an answer during this time brought very few leads. Postings to news groups brought no responses, and the possibility that something was missed because of the configuration that Integral did was considered. It was attempted to obtain the contents of the /root/integral directory, and this was finally accomplished in late June of 2001. It did not appear, with the possible exception of a few minor packages, that the changes that Integral made affected the disk issue.
Fabien Collin of CERN, who was the primary person for working on these servers there, was helpful. He suggested that the discrepancy may occur because of the way the BIOS and the formatting program count a byte as a byte. (Some programs say 1KB=1000B and others say 1KB=1024.) He suggested using cfdisk as this was a little more robust than fdisk.
Working from the list of patches and packages installed as indicated by documentation in the /root/integral directory and extrapolating from the .bash_history file left by the previous admin, I installed the newest version of the util-linux rpms which contains the disk partitioning programs. After partitioning with cfdisk, I had to remake the filesystem and then reboot so that the new partitioning scheme would be recognized.
The filesystem was remade with:
/sbin/mke2fs -b 4096 -m 2
This makes a filesystem with 4096MB blocks and saves 2% of the filesystem for the system.
After this was done, a "df -h" of the system shows:
/dev/sda1 376G 7.6G 360G 2% /data1 /dev/sdb1 601G 336G 252G 57% /data2 /dev/sdc1 601G 24k 589G 0% /data3
And there was much rejoicing. It turns out that an upgrade of the util-linux rpm is not necessary, but the proper disk partition/reboot scheme is necessary. Just make sure you reboot after deleting or adding partitions so that the disk configuration gets reread. Some issues remain. It is the case the disk sizes are getting larger and some drive geometries are not recognized by a partitioning program like fdisk. It is inconvenient, to say the least, to have to reformat and remake filesystems and then reboot a system in order to use an entire disk array. With regard to CMS, I appear to have gotten around this by using the sfdisk command line partitioning tool in an RPM that gets installed with the CMSserver definition. This will require some more testing to make sure it is really useful.
TESTS:
Testing commenced and the final results are in datatests.070501. It is significant
to note that reads have reduced by about 3-5 MB/s. For reads. It is unclear why
this is the case. There may have been kernel configuration parameters that were
changed by Integral to get a little better performance; there may have been a set
of updated packages that is currently missing; or the tests may not have been
performed correctly.
Tony Wildish has supplied a set of disks test that simulate the performance of ORCA and Objectivity data access patterns. These are in the seek-test directory in /root of rimbaud and in the tests directory of the above AFS space. These have not been used yet to test any of the CMS disks.
UTILITIES:
There is a web utility for configuring and monitoring the 3Ware RAID sets.
This can be downloaded from 3Ware's web site.
DOCUMENTATION: There is a manual provided by 3Ware that gives some minimal information. There is also this document and the 3Ware-Howto.txt that was also written by me.
Here are some tests I would like to see
Benchmark tests on:
1. 2.2.19 kernel with 2GB config in kernel
2. 2.2.19 kernel with 1GB config in kernel
3. 2.4 kernel
4. compile module into kernel, does this help?
5. What differences are there with write caching
enabled?
6. Can we set up user processes that
emulate different kinds of data access patterns, i.e. database, regular server, etc.
FINAL SOLUTION:
CMS put rimbaud into their production scheme and ultimately had it fail. After failing multiple times in one week with kernel oops possibly caused by bad SCSI drives or one of the cards, CMS decided to not purchase this equipment. The thought is that it is a possibility when the technology matures but it is not ready for a production situation. SDSS is currently having significant problems with theirs as well, and a discussion with them may be productive.
Please feel free to call or email with any questions.
Thanks,
Joe