wget

I am struggling here decide whether the Bradley Manning proscecutors are disingenuous or just stupid.
I am reacting here to Cory Doctorow’s report that the government’s lawyers accuse Manning of using that criminal spy tool wget.
Notes from the Ducking Stool
I am hoping for stupid, because if they are suggesting to the jury facts they know not to be true, then that is a violation of ethics, their oaths of office, and any concept of justice.
Oh, and wget is exactly what I used, the last time I downloaded files from the NSA.
Really.
A while back, the back issues of the NSA internal newsletter Cryptolog were declassified so I downloaded the complete set.  I think the kids are puzzled about why I never mind having to wait in the car for them to finish something or other, but it is because I am never without a large collection of fascinating stuff.
Here’s how I got them, after scraping the URLs out of the agency’s HTML:
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_01.pdf
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_02.pdf
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_03.pdf
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_04.pdf

. . .
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_132.pdf
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_133.pdf
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_134.pdf
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_135.pdf
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_136.pdf

Email Disaster Recovery and Travel adventures

Cathy is off to China for a few weeks. She wanted email access, but not with her usual laptop.
She uses Windows Vista on a plasticy HP laptop from, well, the Vista era.  It is quite heavy, and these days quite flaky.  It has a tendency to shut down, although not for any obvious reason, other maybe than age, and being Vista running on a plasticy HP laptop.
I set up the iPad, but Cathy wanted a more familiar experience, and needed IE in order to talk to a secure webmail site, so we dusted off an Asus EEE netbook running Windows XP.
I spent a few hours trying to clear off several years off accumulated crapware such as three different search toolbars attached to Internet Explorer, then gave up and re-installed XP from the recovery partition.  123 Windows Updates later, it seemed fine, but still wouldn’t talk to the webmail site.  It turns out that Asus thoughtfully installed the open source local proxy server Privoxy, with no way to uninstall it.  If you run the Privoxy uninstall, it leaves you with no web access at all.  I finally found Interwebs advice to also uninstall the Asus parental controls software, and that fixed it.
Next, I installed Thunderbird, and set it up to work with Cathy’s account on the family compound IMAP server.  I wanted it to work offline, in case of spotty WiFi access in China, but after setting that up, so I “unsubscribed” to most of the IMAP folders and let it download.  Now Cathy’s inbox has 34,000 messages in it, and I got to thinking “what about privacy?”  After all, governments, especially the United States, claim the right to search all electronic devices at the border, and it is also commonly understood that any electronic device you bring to China can be pwned before you come back.
Then I found a setting that tells Thunderbird to download only the last so many days for offline use.  Great!  But it had already downloaded all 6 years of back traffic.  Adjacent, there is a setting for “delete mail more than 20 days (or whatever) old.”
You know what happens next!  I turned that on, and Thunderbird started deleting all Cathy’s mail, both locally and on the server.  Now there is (farther down the page), fine print that explains this will happen, but I didn’t read it.
Parenthetically, this is an awful design.  It really looks like a control associated with how much mail to keep for offline use, but it is not.  It is a dangerous, unguarded, unconfirmed command that does irreversible damage.
I thought this was taking too long, but by the time I figured it out, it was way too late.
So, how to recover?
I have been keeping parallel feeds from Cathy’s email, but only since March or so, since I’ve been experimenting with various spam supression schemes.
I had made a copy of Cathy’s .maildir on the server, but it was from 2011.
But wait! Cathy’s laptop was configured for offline use, and had been turned off.  Yes!  I opened the lid and turned off WiFi as quickly as possible, before it had a chance to sync.  (Actually, the HP has a mechanical switch to turn off WiFi, but I didn’t know that.)  I then changed the username/password on her laptop Thunderbird to stop further syncing.
Next, since the horse was well out of the barn, I made a snapshot of the server .maildir, and of the HP laptop’s Thunderbird profile directories. Now, whatever I did, right or wrong, I could do again.
Time for research!
What I wanted to do seemed clear:  restore the off-line saved copies of the mail from the HP laptop to the IMAP server.  This is not a well-travelled path, but there is some online advice:
http://www.fahim-kawsar.net/blog/2011/01/09/gmail-disaster-recovery-syncing-mail-app-to-gmail-imap/
https://support.mozillamessaging.com/en-US/kb/profiles
The general idea is:

  1. Disconnect from the network
  2. Make copies of everything
  3. While running in offline mode, copy messages from the cached IMAP folders to “Local” folders
  4. Reconnect to the network and sync with the server. This will destroy the cached IMAP folders, but not the new Local copies
  5. Copy from the Local folders back to the IMAP server folders

Seems simple, but in my case, there were any number of issues:

  • Not all server folders were “subscribed” by Thunderbird, and I didn’t know which ones were
  • The deletion was interrupted at some point
  • I didn’t want duplicated messages after recovery
  • INBOX was 10.3 GB (!)
  • The Thunderbird profile altogether was 23 GB (!)
  • The HP laptop was flakey
  • Cathy’s about to leave town, and needs last minute access to working email

One thing at a time.
Tools
I found out about  “MozBackup” and used it to create a backup copy of the HP laptop’s profile directory.
MozBackup
MozBackup creates a zip file of the contents of a Thunderbird profile directory, and can restore them to a different profile on a different computer, making configuration changes as appropriate. This is much better than hand editting the various Thunderbird configuration files.
Hardware problems
As I mentioned, the HP laptop is sort of flakey.  I succeeded in copying the Thunderbird profile directory, but 23 GB worth of copying takes a long time on a single 5400 rpm laptop disk.  I tried copying to a Mybook NAS device, but it was even slower.  What eventually worked, not well, but adequately, was copying to a 250GB USB drive.
I decided to leave the HP out of it, and to do the recovery on the netbook, the only other Windows box available.  I was able to create a second profile on the netbook, and restore the saved profile to it, slowly, but I realized Cathy would leave town before I finished all the steps, taking the netbook with her.  Back to the HP.
First I tried just copying the IMAPMail subfolder files of mbox files and msf files to LocalFolders. This seemed to work, but Thunderbird got very confused about it.  It said there were 114000 messages in Inbox, rather than 34000.  This shortcut is a dead end.
I created a new profile on the HP, and restored the backup using MozBackup (which took 2 hours), and started it in offline mode.  I then tried to “select-all” in Inbox to copy them to a local folder.  Um.  No.  I couldn’t even get control back.  Thunderbird really cannot select 34000 messages and do anything.
Because I was uncertain about the state of the data, I restored the backup again (another 2 hours).
This time, I decided to break up Inbox into year folders, each holding about 7000 messages.  The first one worked, but then the HP did an undexpected shutdown during the second, and when it came back, Inbox was empty! The Inbox mbox file had been deleted.
I did another restore, and managed to create backup files for 2012 and 2011 messages, before it crashed again. (And Inbox was gone AGAIN)
The technique seemed likely to eventually work, but it would drive me crazy.  Or crazier.
I was now accumulating saved Local Folder files representing 3 specific years of Inbox.  I still had to finish the rest, deal with Sent Mail, and audit about 50 other subfolders to see if they needed to be recovered.
I wasn’t too worried about all the archived subfolders, since they hadn’t changed in ages and were well represented by my 2011 copy of Cathy’s server .maildir
Digression
What about server backups?  Embarassing story here!  Back in 2009, Win and I built some nice mini-ATX atom based servers with dual 1.5T disks run in mirrored mode for home servers.  Win’s machine runs the IMAP, and mine mostly has data storage.  Each machine has the mirrored disks for reliabiltiy and a 1.5T USB drive for backup.  The backups are irregularly kept up to date, and in the IMAP machines case, not recently.
About 6 months ago, I got a family pack of CrashPlan for cloud backup, and I use it for my Macbook and for my (non IMAP) server, but we had never gotten around to setting up CrashPlan for either Cathy’s laptop or the IMAP server.
A few months ago, we got a Drobo 5N, and set it up with 3 3T disks, for 6T usable storage, but we haven’t gotten it working for backup either.  (I am writing another post about that.)
So, no useful server backups for Cathy’s mail.
Well now what?
I have a nice Macbook Pro, unfortunately, the 500 GB SSD has 470 GB of data, not enough for one copy of Cathy’s cached mail, let alone two.  I thought about freeing up space, and copied a 160 GB Aperture photo library to two other systems, but it made me nervous to delete it from the Macbook.
I then tried using Mac Thunderbird to set up a profile on that 250 GB external USB drive, but it didn’t work because the FAT filesystem couldn’t handle Mac Thunderbird’s need for fancy filesystem features like ACLs, but this triggered an idea!
First, I was nervous about using Mac Thunderbird to work on backup data from a PC. I know that Thunderbird profile directories are supposed to be cross-platform, but the config files like profile.ini and prefs.js are littered with PC pathnames.
Second, the USB drive is slow, even if it worked.
Up until recently, I’ve been using a 500 GB external Firewire drive for TimeMachine backups of the Macbook.  It still was full of Time Machine data, but I’ve switched to using a 1T partition on the Drobo for TimeMachine.  I also have the CrashPlan backup.  So I reformatted the Firewire Drive to HFS, and plugged it in as extra storage.
Also on the Macbook, is VMWare Fusion, and one of my VMs is a 25 GB instance running XP Pro.
I realized I should be able to move the VM to the Firewire drive, and expand its storage by another 50 GB or so to have room to work on the 23 GB Thunderbird data.
To the Bat Cave!
It turns out to be straightforward to copy a VMWare image to another place, and then run the copy.  Rather than expand the 25GB primary disk, I just added a second virtual drive and used XP Disk management to format it as drive E.  I also used VMWare sharing to share access to the underlying Mac filesystem on the Firewire drive.

  1. Copy VMWare image of XP to the Firewire drive
  2. Copy MozBackup save file of the cached IMAP data and the various Local Files folders to the drive
  3. Create second disk image for XP
  4. Run XP under VMWare Fusion on the Macbook, using the Firewire drive for backing store
  5. Install Thunderbird and MozBackup
  6. Use Mozbackup to restore Cathy’s cached local copies of her mail from the flakey HP laptop
  7. Copy the Local Files mailbox files for 2013, 2012, and 2011 into place.
  8. Use XP Thunderbird running under VMWare to copy the rest of the cached IMAP data into Local Folders.
  9. By hand, compare message counts of all 50 or so other IMAP folders in the cached copy with those still on the server, and determine they were still correct.
  10. Go online, letting Thunderbird sync with the server, deleting all the locally cached IMAP data.
  11. Create IMAP folders for 2007 through 2013, plus Sent Mail and copy the roughly 40000 emails back to the server.

Notes
During all of this, new mail continued to arrive into the IMAP server, and be accessible by the instance of Thunderbird on the netbook.
A copy of Cloudmark Desktop One was active running on the Macbook using Mac Thunderbird to do spam processing of arriving email in Cathy’s IMAP account.
My psyche is scarred, but I did manage to recover from a monstrous mistake.
Lessons

  • RAID IS NOT BACKUP

The IMAP server was reliable, but it didn’t have backups that were useful for recovery.

  • Don’t think you understand what a complex email client is going to do

Don’t experiment with the only copy of something!  I should have made a copy of the IMAP .maildir in a new account, and then futzed with the netbook thunderbird to get the offline use storage the way I wanted.

  • Quantity has a quality all its own.

This quote is usually about massive armies, but in this case, the very large email (23 GB) just made the simplest operations slow, and some things (like selecting ALL in a folder with 34000 messages, impossible.)  I had to go through a lot of extra work because various machines didn’t have enough free storage, and had other headaches because the MTBF of the HP laptop was less than the time to complete tasks.
-Larry