The Great Sata Raid

I ran out of space on my home Linux machine. I suppose it started out life as a 350 MHz Pentium II in about 1998, but I think everything has been replaced since then. The current system is a dual processor (two cores total) Athlon 1.8 GHz machine with 1 GB memory. It has a Tyan Tiger MPX motherboard in an Antec Sonata case and a Matrox G550 graphics card. I run Gentoo Linux on it.

The disk system is software RAID-1 (mirrored disks) on a pair of 250 GB drives. These are on the primary IDE port. A third 250 GB drive and a DVD burner are on the secondary IDE.

I started reading in an accumulated pile of Sony MicroMV tapes (mpeg2) from the family camcorder because I am nervous that the camcorder is getting flaky. On tape 20, the disks filled up. The single drive is mostly full of other video (like 150 odd episodes of Good Eats pulled off the Replay TV with DVArchive.

All this is merely setup about why I have been trying to upgrade the storage. My colleague Jeff Darcy solved a similar problem with a QNAP NAS box, but I am kind of cheap and stubborn. Instead, I got a Syba SD-SATA2-2E2I SATA card and a pair of 1 TB Western Digital drives from Newegg. Oh, and I decided to give Ubuntu a try.

Step 1 – Get the controller to work

The first controller was DOA. This is always a little frightening, did I break it? But not this time. I know how to handle the cards, use good anti-static protocols, and so forth. This card, when plugged into any PCI slot, prevented the machine from booting at all. No beeps, no BIOS. Unplug, and it works. The Syva card is a little weird in that it has PCI slot cutouts for both 5 volt and 3.3 volt slots, but according to the Internet, it really only works in 5 volt slots. Fair enough, but dead. Newegg sent an RMA, and, eventually, a new card that behaved much better.

The next huge issue is BIOS extensions. I guess this is a good idea, so that random new controllers can come with the software to run them. The 2E2I comes with a fake RAID BIOS that is just a Bad Idea. I planned to not use it, and instead to use Linux MD software RAID.

A digression about storage policy.

I’d basically like to save everything forever. I still have files from my grad school days at Stanford. We had a PDP-11/34 running V7 Unix. Since then I’ve been through four companies and a variety of mail system. I have nested archives, typically underneath trees of directories where each level is named “old stuff” or the equivalent. During my entire career, this stuff has been too large to easily put on off-line media, be it floppies, tapes, CDs, ZIP drives, or whatever. In any case, even if you believe the media are stable, and I don’t, the devices to read old media quickly become unavailable. I think the only real solution is to keep rolling your data forward onto new online devices.

Next. I am acutely aware of single points of failure. I am not really happy with only one of something, and I am not really happy with something that I can’t fix. With this home machine, everything is available commodity parts. I’ve had the memory break, the fans break, the power supply break, the graphics card break, and these are petty annoyances, but my data isn’t really at risk. I can fix the hardware, or if necessary, put the drives into another PC. Appliance NAS boxes make me nervous. It isn’t that the drives are flaky or the software buggy, it is that the rest of the hardware isn’t repairable. It isn’t commodity. If something breaks, you have to hope the company is still in business and deal with slow and expensive repairs. Jeff’s QNAP at least can copy itself to another QNAP, but then you really need two, and they aren’t inexpensive.

So I have a RAID system built with commodity parts, and I back up that to another system that is similar. I’d kind of like the backup to be offsite. Cloud storage for offsite makes sense, but 250 GB to 1 TB is sort of unwieldy. One of the other reasons I have for choosing the Syba card is that it has eSata ports so I can occasionally make copies on 1 T external drives and leave them, say, at Grandma’s house.

Back to the hardware setup.

Since I don’t want the hardware RAID, I didn’t set it up. I learned from the Web that Ubuntu 8.10 can install onto RAID, but you have to use the Alternate installer. I followed directions, ignored installer complaints about an inability to reread partition tables, and got through the install…and it wouldn’t boot.

In fact, in the BIOS boot sequence screen, the Sata drives wouldn’t show up at all. According to the Internet, you can get around this by declaring one of the drives to be a concatenated drive with one sub-drive, but that did not work. I tried re-installing, thinking that setting up the concatenated drive scrambled grub, but that didn’t work. By the way, the Ubuntu installer should let you repeat individual steps, like installing grub, without wanting to repartition your drives. Just a thought.

Back to the web. Apparently, the controller has two BIOS flash images. The RAID version is standard, but you can reflash the card with the other image, with basically lets you use the controller in JBOD mode. The actual Sata chip is a Silicon Image 3124, and the flash images and installers are on their website. And only work on DOS or Windows, don’t get me started on that.

I plugged the card into an old Sony Vaio desktop we have for games, and downloaded the installer for Windows, and it didn’t work. It could not find the card at all. The Windows XP device manager knew about it, but the flash installer didn’t. I guessed that the XP driver needed to be installed, and the “check for new hardware” actually worked without requiring a reboot. That never happens. After the driver was running, the flash installer worked ok.

Back to my system, now the disks show up in the boot menu, but won’t boot. Time for another install. Same warning messages, same lack of booting. This time I applied a Principle of Debugging: Do not ignore errors you don’t understand. I send the literal error text to Google, and discovered it was a patch to the Ubuntu installer to prevent it from failing when it encountered remnants of old RAID metadata. My guess is that this stuff was left laying around on the disks by the RAID version of the controller firmware. After more web searching, I found that the dmraid Linux utility, among other things, has the power to delete such things. I got to dmraid by using the install CD to give me a shell on my installed not not working root file system. I erased the RAID junk, but the system still would not boot.

Next I guessed that this was grub not being correctly installed, quite, so I installed it manually, and wrong, so that now I have /boot/boot/<stuff> for some reason, but the system can now boot.

Copying old data

Next, I wanted to copy data from my old raid sets onto the new 960 gigabyte /home partition of the new ones. All this time, I had the primary IDE controller cable unplugged, so that there would be no way for the Ubuntu installer to erase my data. I plugged it in again, and the Sata controller disappeared! The old drives were now visible, and bootable, but once booted, lspci couldn’t see the new controller. Unplug the primary IDE cable and it came back. Weird. The CD drive and my other 250 GB drive on the secondary IDE were working with the Sata drives, but not the primary IDE.

Finally, I just unplugged the secondary, and plugged my old RAID drives into the secondary IDE connector on the motherboard. That worked, and now I have the new terabyte RAID system and my old data at the same time. As we speak, I am rsync’ing the contents onto the new drives.

It should go without saying. This is all way too complicated. And this isn’t really my primary computer anymore. Mostly I use a 17″ Macbook Pro. That I back up with an Apple Time Capsule with a 1 Terabyte drive. Yes, it doesn’t really meet my goals of repairability, but it just works.