Big Data

I propose a definition of Big Data.  Big Data is stuff that you cannot process within the MTBF of your tools.

Here’s the story about making a backup of a 1.1 Terabyte filesystem with several million files.

A few years ago, Win and I built a set of home servers out of mini-ATX motherboards with Atom processors and dual 1.5 Terabyte drives.  We built three, one for Win’s house, that serves as the compound IMAP server and such like, one for my house, which mostly has data and a duplicate DHCP server and such like, and one, called sector9, which has the master copy of the various open source SiCortex archives.

These machines are so dusty that it is no longer possible to run apt-get update, and so we’re planning to just reinstall more modern releases.  In order to do that, it is only prudent to have a couple of backups.

In the case of sector9, it has a pair of 1.5 T drives set up as RAID 1 (mirrored).  We also have a 1.5T drive in an external USB case as a backup device.  The original data is still on a 1T external drive, but with the addition of this and that, the size of sector9’s data had grown to 1.1T.

I decided to make a new backup.  We have a new Drobo5N NAS device, with 3 3T drives, set up for single redundancy, giving it 6T of storage.  Using 1.1T for this would be just fine.

There have been any number of problems.

Idea 1 – mount the Drobo on sector9 and use cp -a or rsync to copy the data

The Drobo supports only AFP (Apple Filesharing Protocol) and CIFS (Windows file sharing).  I could mount the Drobo on sector9 using Samba, except that sector9 doesn’t already have Samba, and apt-get won’t work due to the age of the thing.

Idea 2 – mount the Drobo on my Macbook using AFP, and mount sector9 on the Macbook using NFS.

Weirdly, I had never installed the necessary packages on sector9 to export filesystems using NFS.

Idea 3 – mount the Drobo on my Macbook using AFP and use rsync to copy files from sector9.

This works, for a while.  The first attempt ran at about 3 MB/second, and copied about 700,000 files before hanging, for some unknown reason.  I got it unwedged somehow, but not trusting the state of everything, rebooted the Macbook before trying again.

The second time, rsync took a couple of hours to figure out where it was, and resumed copying, but only survived a little while longer before hanging again. The Drobo became completely unresponsive.  Turning it off and on did not fix it.

I called Drobo tech support, and they were knowledgeable and helpful.  After a long sequence of steps, invoving unplugging the drives, and restarting the Drobo without the mSata SSD plugged in, we were able to telnet to it management port, but the Drobo Desktop management application still didn’t work. That was in turn resolved by uninstalling and reinstalling Drobo Desktop (on a Mac! Isn’t this disease limited to PCs?)

At this point, Drobo tech support asked me to use the Drobo Desktop feature to download the Drobo diagnostic logs and send them in….but the diagnostic log download hung.  Since the Drobo was otherwise operational, we didn’t pursue it at the time.  (A week later, I got a followup email asking me if I was still having trouble, and this time the diagnostic download worked, but the logs didn’t show any reason for the original hang.)

By the way, while talking to Drobo tech support, I discovered a weath of websites that offer extra plugins for Drobos (which run some variant of linux or bsd).  They include an nfs server, but using it kind of voids your tech support, so I didn’t

A third attempt to use rsync ran for a while before mysteriously failing as well.  It was clear to me that while rsync will synchronize two filesystems, it might never finish if it has to check its work from the beginning and doesn’t last long enough to finish.

I was also growing nervous about the second problem with the Drobo, that it uses NTFS, not a a linux filesystem.  As such, it was not setting directory dates, and was spitting warnings about symbolic links.  Symbolic links are supposed to work on the Drobo.  In fact, I could use ln -s in a Macbook shell just fine, but what shows up in a directory listing is subtly different than what shows up in a small rsync of linux symbolic links.

Idea 4:  Mount the drobo on qadgop (my other server, which does happen to have Samba installed) and use rsync.

This again failed to work for symbolic links, and a variety of attempts to change the linux smb.conf file in ways suggested by the Internet didn’t fix it.  There were suggestions to root the Drobo and edit its configuration files, but again, that made me nervous.

At this point, my problems are twofold:

  • How to move the bits to the Drobo
  • How to convince myself that any eventual backup was actually correct.

I decided to create some end-to-end check data, by using find and md5sum to create a file of file checksums.

First, I got to wondering how healthy the disk drives on sector9 actually were, so I decided to try SMART. Naturally, the SMART tools for linux were not installed on sector9, but I was able to download the tarball and compile them from sources.  Alarmingly, SMART told me that for various reasons I didn’t understand, both drives were likely to fail within 24 hours.  They told me the external USB drive was fine.  Did it really hold a current backup?  The date on the masking tape on the drive said 5/2012 or something about a year old.

I started find jobs running on both the internal drives and the external:

find . -type f -exec md5sum {} ; >s9.md5
find . -typef -exec md5sum {} ; >s9backup.md5

These jobs actually ran to completion in about 24 hours each.  I now had two files, like this:


root@sector9:~# ls -l *.md5
-rw-r--r-- 1 root root 457871770 2013-07-08 01:24 s9backup.md5
-rw-r--r-- 1 root root 457871770 2013-07-07 21:39 s9.md5
root@sector9:~# wc s9.md5
3405297 6811036 457871770 s9.md5

This was encouraging, the files were the same length, but diffing 450 MB files is not for the faint of heart, expecially since find doesn’t enumerate them in the same order.  I had to sort each file, then diff the sorted files.  This took a while, but in fact the sector9 filesystem and its backup were identical.  I resolved to use this technique to check any eventual Drobo backup.  It also relieved my worries that the internal drives might fail at any moment.  I also learned that the sector9 filesystem had about 3.4 million files on it.

Idea 5: Create a container file on the Drobo, with an ext2 filesystem inside, and use that to hold the files.

This would solve the problem of putting symbolic links on the Drobo filesystem (even though it is supposed to work!) It would also fix the problem of NTFS not supporting directory timestamps or linux special files.  I was pretty sure there would be root filesystem images in the sector9 data for the SiCortex machine and for its embedded processors, and I would need special files.

But how to create the container file? I wanted a 1.2 Terabyte filesystem, slightly bigger than the actual data used on sector9.

According to the InterWebs, you use dd(1), like this:

dd if=/dev/zero of=container.file block=1M seek=1153433 count=0

I tried it:

dd if=/dev/zero of=container.file block=1M seek=1153433

It seemed to take a long time, so I thought probably it was creating a real file, instead of a sparse file, and went to bed.

The next morning it was still running.

That afternoon, I began getting emails from the Drobo that I should add more drives, as it was nearly full, then actually full.  Oops. I had left off the count=0.

Luckily, deleting a 5 Terabyte file is much faster than creating one!  I tried again, and the dd command with count=0 ran very quickly.

I thought that MacOS could create the filesystem, but I couldn’t figure out how.  I am not sure that MacOS even has something like the linux loop device, and I couldn’t figure out how to get DiskUtility to create a unix filesystem in an image file.

I mounted the Drobo on qadgop, using Samba, and then used the linux loop device to give device level access to the container file, and I was able to mkfs an ext2 filesystem on it.

Idea 6: Mount the container file on MacOS and use rsync to write files into it.

I couldn’t figure out how to mount it!  Again, MacOS seems to lack the loop device.  I tried using DiskUtility to pretend my container file was a DVD image, but it seems to have hardwired the notion that DVDs must have ISO filesystems.

Idea 7: Mount the Drobo on linux, loop mount the container, USB mount the sector9 backup drive.

This worked, sort of.  I was able to use rsync to copy a million files or so before rsync died.  Restarting it got substantially further, and a third run appeared to finish.

The series of rsyncs took several couple of days to run.  Sometimes they would run at about 3 MB/s, and sometimes at about 7 MB/sec.  No idea why.  The Drobo will accept data at 11 MB/sec using AFP, so perhaps this was due to slow performance of the USB drive.  The whole copy took close to 83 hours, as calculated by 1.1 T at 3 MB/sec.

Unfortunately, df said the container filesystem was 100% full and the final rsync had errors “previously reported” but scrolled off the screen. I am pretty sure the 100% is a red herring, because linux likes to reserve 10% of space for root, and the container file was sized to be more than 90% full.

I reran the rsync, under a script(1) to get a log file, and found many errors of the form “can’t mkdir <something or other>”.

Next, I tried mkdir by hand, and it hung.  Oops.  Ps said it was stalled in state D, which I know to be disk wait.  In other words, the ext2 filesystem was damaged.  By use of kill -9 and waiting, I was able to unmount the loop device and Drobo, and remount the Drobo.

Next, I tried using fsck to check the container filesystem image.

fsck takes hours to check a 1.2T filesystem.  Eventually, it started asking me about random problems and could I authorize it to fix them?  After typing “y” a few hundred times, I gave up and killed the fsck and restarted it fsck -p to automatically fix problems.  Recall that I don’t actually care if it is perfect, because I can rerun rsync and check the final results using my md5 checksum data.

The second attempt to run fsck didn’t work either:

root@qadgop:~# fsck -a /dev/loop0
fsck 1.41.4 (27-Jan-2009)
/dev/loop0 contains a file system with errors, check forced.
/dev/loop0: Directory inode 54583727, block 0, offset 0: directory corrupted

/dev/loop0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)

Hoping that the fsck -a had fixed most of the problems, I ran it a third time again without -a, but I wound up typing ‘y’ a few hundred more times.  fsck took about 300 minutes of CPU time on the Atom to do this work and left 37 MB worth of files and directories in /lost+found.

With the container filesystem repairs, I started a fourth rsync, which actually finished, transferring another 93 MB.

Next step – are the files really all there and all the same?  I’ll run the find -exec md5sum to find out.

Um.  Well.  What does this mean?

root@qadgop:~# wc drobos9.md5 s9.md5
3526171 7052801 503407914 drobos9.md5
3405297 6811036 457871770 s9.md5

The target has 3.5 million files, while the source has 3.4 million files!  That doesn’t seem right.  An hour of running “du” and comparing the top few levels of directories shows that while rerunning rsync to finish interrupted copies works, you really have to use the same command lines.  I had what appeared to be a complete copy one level below a partial copy.  After deleting the extra directories, and using fgrep and sed to rewrite the path names in the file of checksums, I was finally able to do a diff of the sorted md5sum files:

Out of 3.4 million files, there were 8 items like this:

< 51664d59ab77b53254b0f22fb8fdb3a8 ./sicortex-archive/stash/97/sha1_97e18c8e2261b09e21b0febd75f61635d7631662_64088060.bin

> 127cc574dcb262f4e9e13f9e1363944e ./sicortex-archive/stash/97/sha1_97e18c8e2261b09e21b0febd75f61635d7631662_64088060.bin
1402503c1402502

and one like this:

> 8d9364556a7891de1c9a9352e8306476  ./downloads.sicortex.com/dreamhost/ftp.downloads.sicortex.com/ISO/V3.1/.SiCortex_Linux_V3.1_S_Disk1_of_2.iso.eNLXKu

The second one is easier to explain, it is a partially completed rsync, so I deleted it.  The other 8 appear to be files that were copied incorrectly!  I should have checked the lengths, because these could be copies that failed due to running out of space, but I just reran rsync on those 8 files in –checksum mode.

Final result: 1.1 Terabytes and 3.4 million files copied.  Elapsed time, about a month.

What did I learn?

  • Drobo seems like a good idea, but systems that ever need tech support intervention make me nervous.  My remaining worry about it is proprietary hardware.  I don’t have the PC ecosystem to supply spare parts.  Perhaps the right idea is to get two.
  • Use linux filesystems to hold linux files.  It isn’t just Linus’ and his files that vary only in capitalization, it is also the need to hold special files and symlinks. Container files and loop mounting works fine.
  • Keep machines updated. We let these get so far behind that we could no longer install new packages.
  • A meta-rsync would be nice, that could use auxiliary data to manage restarts.
  • Filesystems really should have end-to-end checksums.  ZFS and BTRFS seem like good ideas.
  • SMB, or CIFS, or the Drobo, or AFP are not good at metadata operations, it was a fail to try writing large numbers of individual files on the Drobo, no matter how I tried it.  SMB read write access to a single big file seems to be perfectly reliable.

 

2 thoughts on “Big Data”

  1. Did you consider running Linux in a VM on the MacBook? This would allow you to create a file in a Mac filesystem that is a dd-able Linux fs image of whatever flavor you’d like (ext2, ext3, etc). Since speed doesn’t seem to be of the essence, the extra overhead (which is close to marginal anyhow) wouldn’t matter.

    1. I didn’t think of it. I have an Ubuntu VM laying around, and that wouldn’t have been hard. I’m puzzled why
      it didn’t occur to me, since I had just been using VMWare Fusion to run Windows to recover lost email.

Leave a Reply

Your email address will not be published. Required fields are marked *