A Rubble of Bits – Page 6 – Just another WordPress site

Windows 7 Disk Upgrade

It is a mystery to me why laptop makers charge such a premium for SSDs. Well, no, it’s not a mystery, they do it because they can. Part of the reason is that it is such a pain, in the Windows world, to upgrade.
Cathy recently got a new HP ProBook 640 G1, replacing her ancient Vista machine. The new laptop came with a 128 GB SSD, which served its purpose of demonstrating how dramatically faster the SSD is than a regular hard drive, but it is too small. Her old machine, after deleting about 50 GB of duplicate stuff, was already at 128.
It is much cheaper to buy an aftermarket 256 GB SSD than to buy the same laptop with a larger SSD. so we set about an upgrade.
HP Laptops, at least this one, do not ship with install disks, instead, they come with a 12 GB “recovery partition” that soaks up even more of the precious space. You can reinstall the OS from the recovery partition as often as you like, or you can, exactly once, make a set of recovery DVDs or a recovery USB drive.
There are two main paths to doing a disk upgrade:

Replace the disk, and reinstall from the recovery media
Replace the disk, make the old disk an external drive, and clone the old disk to the new one.

The first path is less risky, so we tried that first. I had purchased a nice, large USB3 thumb drive for the purpose, and … the HP recovery disk creator would not create a USB! What is this, 2004? HP support is actually quite good, and I suppose that is part of what you pay for when you buy a “business” notebook. They were surprised by this lack of functionality, since it is supposed to work, and eventually decided to send us recovery media. They sent DVDs, which is not what we want, but fine.
The HP media worked fine to install onto the new 256 GB SSD, but did not restore much of the HP add on software. Most manufacturer add-on software is crapware, but HPs isn’t bad. We got most of the missing bits from the support.hp.com website except for the HP documentation! You can get the PDF files for the user and service manuals, but not the online HP Documentation app.
Our plan was eventually to trickle down the 128 GB SSD to one of the kids, so we didn’t mind using up its ability to create recovery media, so we tried that next. Rather than screw up the almost-working 256 GB drive, we installed an old 160 GB drive from Samantha’s old Macbook (replaced earlier by an SSD).
The home-created recovery media did better, installing all the HP add-ons…except the documentation!
Now with three working drives, and two sets of recovery disks, I felt confident enough to try the alternative: cloning the original drive. I had a copy of Acronis True Image 2010, but couldn’t find the disk for it. The new SSD came with a copy of True Image 2014, but first I read up on the accumulated wisdom of the Internet. There’s a guy, GroverH, on the Acronis forums (see https://forum.acronis.com/forum/3426 ) who has an astonishing set of howtos.
Manufacturers who use recovery partitions really don’t want you to clone drives, perhaps this is pressure from Microsoft. It works fine if the new drive is exactly the same as the old one, but if not, unless the partition sizes are exactly the same, the result is not likely to work. The cloning software will scale the partitions if you restore to a bigger drive, but they won’t work. You have to manually tweak the partition arrangement. Typically the recovery partition is at the end, the boot partition is at the beginning, and the “C:” drive uses the space inbetween.
Now earlier when I couldn’t find the True Image install disk on another project, I tried the Open Source CloneZilla and was quite happy with it. It is not for the faint-hearted, but it seems reliable. I used CloneZilla to make a backup of the original drive, and then, because the recovery media had already created a working partition structure, merely restored C: to the C: of the experimental 160 GB drive. Windows felt like it had to do a chkdsk, but after that it worked, and lo, the HP documentation was back! (And Cathy’s new screen background.)
As the last step, we put the 256 GB SSD back in, and used CloneZilla to restore C: and the HP_TOOLS partition contents that weren’t quite the same in the original and recovered versions.
Whew!
So, contrast to a disk upgrade on a Mac: Put in new drive, restore from Time Capsule, done. And this restores all user files and applications!
Next challenge: migrating Cathy’s data files and reinstalling applications. Memo to Microsoft: it is just unreasonable that in this new century we still have to reinstall applications one by one.

Hotel Internet – Hyatt French Quarter

I write from my room at the Hyatt French Quarter.
Your hotel internet service stinks.
I would rather stay in a Hampton Inn or like that than a Hyatt. You know why? The internet service in cheap hotels just works. Yours does not.
You advertise “free internet”, but it costs rather a lot in the inconvenience and irritation of your customers, who are paying you quite a lot of money for a nice experience.
I have three devices with me. A laptop, a tablet, and a phone. On each one, every day of my stay, at (apparently) a random time, each one stops working and I have to connect again.
Here is what that takes:

Try to use my email. Doesn’t work
Remember that I have to FIRST use a web browser.
Connect to hotel WiFi (ok, this step is expected, once)
Get browser intercept screen
Type in my name and room number
Wait
Read offer to pay $5 extra for “good” internet service, rather than crappy. The text says this offer “lasts as long as your current package” is that per day? Per stay? What?
Click “continue with current package”
Wait
Get connected to FACEBOOK.

Why? I can’t explain it. People my age think Facebook is something kids use
to share selfies. The kids think Facebook is for, I don’t know, old people, they
are all on Twitter.
Then I have to remember what I wanted to do.
Are you serious? Do you think this process, repeated for my three devices, EVERY DAY, is going to make me recommend your hotel?
Now let us talk about privacy.
It irritates me that you want my name and room number. I do not agree that you can track my activities online. It is none of your business. I run an encrypted proxy server back home. So all your logs will show is that I set up one encrypted connection to the cloud for my web access. My email connections are all encrypted. My remote logins to the office are all encrypted. My IMs are encrypted.
I read the terms and conditions, by the way. They are linked off the sign on page. They are poorly written legalese, and there are a number of ways to read them. One way says that you track all my connections to websites but only link them to my personally identifiable information if you need to “to enforce our terms and conditions”. They also say that you have no obligation to keep my activities confidential. And who or what is Roomlynx?
Even if your terms said otherwise, I wouldn’t believe you. I don’t trust you OR your service providers.
Here’s my suggestion:
I think all this effort you’ve gone to is a waste of time, effort, and money. You do not have the technical means to monitor or control how I use the net anyway, so why make your customers jump through hoops?
If your lawyers tell you these steps are necessary, get different lawyers who have a clue. If you still think it is necessary, have the terms and conditions be attached to the room contract!
If you seriously have a problem with non-guests soaking up your bandwidth, then by all means add a WiFi password, and hand it out at checkin.
If you seriously have a problem with bandwidth hogs, then slow down the connections of actual offenders.
Basically, try your best to make the Internet work as well as the electricity you supply to my room. I turn on the switch, the lights go on. Done.
By the way, modern OS’s like Apples MacOS Yosemite, frequently change the MAC address they use. This will likely break your login system, raising the frustration of your guests even more. They will not blame Apple for trying to protect their privacy. They will blame you. I already do.
PS I don’t like to help you debug a system that is fundamentally broken, but:

The hotel website still says Internet costs $9.95 per day. Update that maybe?
There is no way to go back and pay the extra $5 for better service one you’ve found out how crappy the regular stuff is.
After you connect, you can no longer find the terms and conditions page
I accidently tried to play a video, and your freaking login screen showed up in the video pane. That just makes you look even sillier.

Random Walks

One blog I follow is GÖDEL’S LOST LETTER
In the post Do Random Walks Help Avoid Fireworks, Pip references George Polya’s proof that on regular lattices in one and two dimensions, a random walk returns to the origin infinitely many times, but in three dimensions, the probability of ever returning to the origin is strictly less than one.
He references a rather approachable paper explaining this by Shrirang Mare: Polya’s Recurrence Theorem which explains a proof of this matter using lattices of reisistors in an electrical circuit analogy. The key is that there is infinite resistance to infinity in one or two dimensions, but strictly less than infinite resistance to infinity in three dimensions.
This is all fine, but there is another connection in science fiction. In 1959, E.E. “Doc” Smith’s The Galaxy Primes was published in Amazing Stories.
Our Heros have built a teleporting starship, but they can’t control where it goes. The jumps appear long and random. Garlock says to Belle:

“You can call that a fact. But I want you and Jim to do some math. We know that we’re making mighty long jumps. Assuming that they’re at perfect random, and of approximately the same length, the probability is greater than one-half that we’re getting farther and farther away from Tellus. Is there a jump number, N, at which the probability is one-half that we land nearer Tellus instead of farther away? My jump-at-conclusions guess is that there isn’t. That the first jump set up a bias.”
“Ouch. That isn’t in any of the books,” James said. “In other words, do we or do we not attain a maximum? You’re making some bum assumptions; among others that space isn’t curved and that the dimensions of the universe are very large compared to the length of our jumps. I’ll see if I can put it into shape to feed to Compy. You’ve always held that these generators work at random—the rest of those assumptions are based on your theory?”

Garlock is right – this is a three dimensional random walk and tends not to return to its starting place, but James is wrong when he says this isn’t in any of the books. Polya proved it in 1921.

Duress passwords and other side effects

Fifteen years ago when we built our house, we had a home security system installed. It has the usual alarm panel with a keypad inside the door. When you come in the house, you have 30 seconds to key in your password to stop the alarm from going off.
If the alarm does go off, the monitoring company will call you to find out if it was a mistake or a real alarm. Each authorized user has a passcode to authenticate themselves to the monitoring company. You can’t have the burglar answering the phone “No problem here! False alarm…”
In fact, there are two passcodes, one authenticates you, and the other is a duress password. If the burglar is there with you, you use the duress password, and the monitoring company behaves exactly the same way, but they also call the local police for you. It is important that the burglar cannot tell the difference.
It seems to me that ATM cards should have duress PINs as well as real ones. If a criminal says “type in your ATM pin or else” then fine, you enter the duress PIN. The ATM behaves exactly the same way, but the bank alerts the police and sends them the surveillance video.
Duress passwords have a lot of other potential uses. If your school principal demands your facebook password, you give up your duress password. What happens next could depend on which password you give. At the extreme, your whole account could be deleted. It could be archived on servers out of legal jurisdiction, your stuff visible only to friends could seem not to exist for a week. Whatever. Options that appear not to do anything are best, because then the school admins can’t tell you have disobeyed them and suspend you.
While I am riffing, there should be a phrase you can say, like “I do not consent to this search” or a similar account setting, that makes the administrator’s access an automatic CFAA violation. (I think the CFAA should be junked, but if not, it should be used to user’s benefit, not just the man’s.)
Finally, regarding authentication, there should also be two-factor authentication for everything, and single-use passwords for everything. Why not? Everyone has a nice computing device with them at all times. Of course your phone and the authentication app should have a duress unlock code.
So next time you are building an authentication structure, build in support for one-time passwords, two factor authentication, and a flexible set of duress passwords.

Home HVAC

It has finally gotten cold here. Right now is it about 17F outside. Previously we had been getting by with just the heating zones for the kitchen/family room and the master bedroom turned on.
A few days ago, the boys had trouble getting to sleep while we were watching TV, because the noise from the set was keeping them up. Alex closed the door. The next morning, I noticed it was 55F in their room. Well, I reasoned, the heating zone up there is not turned on, and with the door shut, warm air from the rest of the house can’t get in so easily. I turned on the heat. The next night Alex happened to close the door again, and in the morning it was 52F. That isn’t so good.
Friday we had the neighbors over for dinner so I turned on the dining room heat. A couple hours later I went to check on it and it wasn’t any warmer.

This is our heating system. This is a gas fired hot water system. The “boiler” is the green box on the lower left. It heats water to 160F or so. From there, there are 9 heating zones. The horizontal pipe manifold in the front is the the return path to the boiler. The vertical pipes with yellow shutoffs representing the returns for each zone. The supply manifold is behind, along with the pumps and so forth. One zone heats water in the blue tank for domestic hot water faucets and showers. The other zones have circulating pumps that feed tubing that zigzags under the floors . This is called radiant heating.

Radiant Zone Manifold — Hot water is routed through many plastic tubes that warm the floor from below.

Each zone typically has a manifold like this one that routes hot water through synthetic rubber tubes that are stapled to the undersize of the floors, and insulated below that to direct their heat upwards. This lets you walk around on warm floors and actually get by with colder air temperatures. Our oldest daughter was in the habit of leaving the next day’s clothes on the floor covered with a blanket, so they would be prewarmed in the morning. Notice that one tube is turned off. That one runs underneath the kitchen pantry, which we try to keep colder.
In the main system photo, on the left, you can see electronics boxes on the wall. Here’s a closeup.

Each zone has a thermostat, which comes into one of these boxes. This is a three channel box, with three 24 volt thermostats coming in on brown wires at the top, and red wiring for three 120 volt zone circulator pumps at the bottom. The box also signals the main boiler that heat is being called by at least one zone. Each zone has a plug in relay, one of which I have unplugged.
The circulator pumps look like this

So there is a central gas water heater, which feeds a number of zones. Each zone has a water circulation pump, controlled by a thermostat. The pump feeds hot water through rubber tubes on the underside of the floors.
Individual zones have failed before. I have fixed them by replacing the circulator pump. You can get these anywhere.

The hardest part about replacing these is the electrical wiring, which is hardwired by wirenuts in the green box attached to the pump. First, turn off the power. I did this by physically pulling the relay for the appropriate zone. Then I measure the pump current using a clamp on ammeter. Then I measure the voltage. Only then do I unscrew the wirenuts protecting the wires, and without touching the bare wires, touch the end to ground. Then brush the wire with the back of your hand only. If the wire is live, the electricity will contract your arm muscles, pulling your hand away. If you can’t think of at least four ways to make sure the wires are not live, hire someone to do this for you. Really. There are old electricians, and there are bold electricians. There are no old, bold electricians. I am an old electrician.
Our system has shutoff valves immediately on both sides of the pump. By turning those off, you can swap out the pump without draining all the water out of the system. As you can see in the picture, the pump is held in place by flanges at the inlet (bottom) and outlet (top). Each flange has two stainless steel bolts, so they won’t rust. In a burst of cleverness or good design, the nuts on these bolts are 11/16 and the bolts themselves are 5/8, so you can take them apart with only one set of wrenches. Here’s the pump I removed.

Note the corrosion inside the pump. I put the new pump in place and turned this zone back on, and now the dining room was getting heat. While I was down there, I took a look at this thing.

This is an air removal valve. It is installed on top of the boiler, along with a pressure relief valve. On some intuition, I lifted the pressure relief valve toggle, and air came out, followed by water. That is not good. The water for a heating system like this comes from town water, which has dissolved gas in it. Typically this will be air, although in the Marcellus Shale areas it can be natuural gas (in those areas, you can set your sink on fire). Air is bad for forced hot water systems. it corrodes the inside of the pipes, and water pumps won’t pump air, usually. If the radiant tubes get full of air, they will not be heating. By the way, these pipes are so rusty because some years ago the boiler was overheating to the point that the relief valve was opening, getting water everywhere. This was because the temperature sensor had come unstuck from the pipe it was measuring. Fixed by a clever plumber with a stainless pipe clamp. As collateral damage from rapid cycling, I had to get a new gas valve too. Separate story.
After waiting a few few minutes, I tried the relief valve again and got more air. This meant that the air removal valve wasn’t working, and probably some of my zones weren’t working because of air-bound pumps or bubbles in the pipes. You might be wondering how the valve knows to let out air, but not water. Inside the cylinder is a float. When there is water inside, the float rises and closes the output port. When there is air inside, the float falls, opening the outlet port and letting out the air. It is pretty simple. I called a plumber friend to see if he could fix this and he said “if you can replace a zone pump, you can replace this valve too.” Basically, you turn off the system, close all the valves, to minimize the amount of water that will come out, depressurize the system, and work fast. A new valve was $13 at Home Depot. The fact they had 10 in stock suggests they do go bad. Unfortunatly I failed to depressurize the system as well as I thought, and I got a 3 foot high gusher of 130F water. Be careful! Heating systems run at around 10 psi. The pressure comes partly from town water pressure through a pressure regulator, and partly from the expansion of hot water. There is an expansion tank to reduce that effect.
The next day, I tried the pressure relief valve again and got water immediately. Probably this means the new valve is working.

Each zone has a temperature gauge. You can see that the two on the right are low, and the two on the left in this picture are not. The right hand zone had the pump I replaced. The next one was not turned on. The temperature gauges are there because you don’t want to run 160F water through these radiant tubes. The floors will get too hot and the tubes won’t last very long. Instead, each zone has a check valve and a mixing valve.

The check valve keeps the loop from flowing backwards, or generally keeps it from circulating by gravity. Cold water is slightly denser than hot water, so the water on the colder side of the loop will fall, pulling hot water around the loop even without the pump running. The spring in the check valve is enough to stop gravity circulation.

The mixing valve has a green adjusting knob. This valve mixes hot water from the boiler with cooler water from the return leg of the zone, and serves to adjust the temperature of the water in each zone. Some water recirculates, with some hot water added.
When I turned on the zone second from the right, it did not work. The temperature gauge stayed put at 80F, (conducted heat through the copper pipes). I used my ammeter to confirm the pump was drawing power. I turned off the valves for all the other zones, so that this one would have more water. Didn’t work.

There are three reasons why a hot water zone might not work: the pump is not spinning, the pump is trying to pump air, or the pipes are clogged. I had just replaced a pump to fix a zone, but was there a second bad pump? Or something else?
I have an intra-red non-contact thermometer, and I used it to measure the pump housing temperatures. The working pumps were all around 125F, the non working pump was around 175F. That might mean that it was stalled, and not spinning, or that it was pumping air, and not being cooled by the water. I had one more spare pump, but I was getting suspicious.
I got to wondering if the pump I removed was really broken. I knew that these Taco 007-F5 pumps have a replaceable cartridge inside, but since the cartridge costs almost as much as a new pump I had never bothered with it. I decided to take apart the pump I removed to see what it looks like.

Disassembled pump — Disassembled Taco 007-F5 Pump

The pump housing is on the left. The impeller attached to the replaceable cartridge is in the center, and the motor proper is on the right. The impeller wasn’t jammed, but I wanted to know if it was working at all. I cut the cord off a broken lamp and used it to wire up the pump.

I was careful not to touch the pump when plugged in, because you will notice there is no ground. The impeller worked fine. Probably there was never anything wrong with the pump. While I had it set up like this, I measured 0.7 Amps current when running, which is what it should be. I then held on to the (plastic) impeller and turned it on. When stalled, the motor draw rose to 1.25 Amps. I now had a way to tell if a motor was stalled or spinning! The suspect zone was drawing .79 Amps, which probably means it was spinning, and the high temperature meant there was no water inside.
Around this time, Win called to ask me to go pick up firewood. While waiting I explained all this to Cathy. She has a PhD in Chemical Engineering, and has forgotten more about pipes and fluid flow that I will ever learn. She says “are the pumps self-priming?” Priming is the process of getting water into the pump so that they have something to pump. A self priming pump will pump air well enough to pull water up the pipe from a lower reservoir. A non-self-priming pump will not. These pumps are not self-priming. They depend on something else to get started. Cathy says “are the pumps below the reservoir level?” No. they are above the boiler. Cathy says “I would design such a thing with the pumps below the reservoir level, so they prime automatically”. Um, OK, but how does that help me? Cathy says “Turn off the top valve, take off the top flange and pour water in the top.” Doh.
I didn’t quite do that, because I remembered the geyser I got taking off the air vent. If I could let air out the top, water might flow in from below. All I did was loosen the bolts on the top flange a little. After about 10 seconds, I started getting water drops out of the joint, so I tightened the bolts and turned on the pump. Success! After a few minutes, the temperature gauge started to rise.
So probably my problems were too much air in the system all along.
On the way to buy a new air vent, I stopped at Win’s house to check his air vent, but we couldn’t find it! Either it is hidden away pretty well, which seems like a bad idea, or there isn’t one, which also seems like a bad plan. We’re puzzled, but he has heat. And now, so do I!
UPDATE 12/15/2013
One heating zone still doesn’t work. The temperature gauge near the pump rises to 100, and the nearby pipes are warm, but the pipes upstairs (this is a second floor zone) are cold. I replaced the cartridge of the pump with the one I took apart the other day, and it spins, but there is no change. The pump is drawing current consistent with spinning. I loosen the top flange above the pump and water comes out. These symptoms are consistent with the pump spinning, and having water, but there is no flow all the way around the loop.
I took a detour to the Taco website and looked at the pump performance curves for the 007-F5, which are at http://www.taco-hvac.com/uploads/FileLibrary/00_pumps_family_curves.pdf. A pump has a certain ability to push water uphill. The weight of water above the pump more or less pushes back on the pressure generated by the pump. This height of water is called the “head”. A pump will pump more water against a lower head, and as the head is larger, the pump will deliver less and less water. Above a certain head it won’t work at all. According to the performance curves for my circulating pumps, their flow rate will drop to 0 at 10 feet of head. From the pump location to the distribution manifold in the wall behind the closet in the upstairs bedroom is about 18 feet. This pump cannot work if the pipe is not completely full of water. If both the supply pipe to the upstairs and the return pipe coming back are full of water, then because water is incompressible, the suction of the water falling down the return pipe will balance the weight of water in the supply pipe. If the pipe is full of air, as it likely is, then this pump is not powerful enough to lift water to the top.
The solution to this is to “purge” the air out of the pipes, by using some external source of pressure to push water into the supply end until all the air is pushed out of the return end. For this to work, the return end must be opened up to atmosphere, otherwise there’s no place for the air to go. (It will likely just get squeezed by the pressure, but there is no route for it to get to, for example, the air vent. I think you need a pretty high flow rate to do this, because the return pipe is 3/4 inch, and without a high flow rate, the air bubbles will float up against the downwards flow of water.
Some systems have air vents at the high points. Mine do not. This would help, because water would flow up both the supply and return pipes, lifted by the 10PSI system pressure. Since it only takes 7.8 psi pressure to lift water 18 feet, this would completely fill the pipes. Of course there would be a potentially leaky air vent inside the walls upstairs, to cause trouble in some future year. I don’t know if the lack of vents is sloppy installation or if one is supposed to use some other method of purging.
My system installation has no obvious (to me anyway) purge arrangements. To purge, you shut off valves on the boiler, put a hose from a valve on the return side into a bucket of water, and turn on external water on the supply side. When the host stops bubbling air, you are good to go.
In my system, makeup water comes from the house cold water pipes, through a backflow preventer and a pressure reduction valve to the hot water manifold. The return pipes from the zones flow to the boiler return manifold and then to the boiler. There is no master return shutoff, and no purge tap on the return maniforld. There is a drain tap on the boiler itself, and there is a tap between the boiler and a valve that can isolate the boiler from the hot water supply manifold. The pressure regulator has a little lever on the top that according to its user manual will open the regulator and let more water through for purging.
I could close the valve to isolate the boiler from the supply manifold, but then the purge water has to run all the way through the boiler to get to the outlet hose. I would lose all the hot water in the boiler.
But I have a missing pump! Years ago, I borrowed the pump from the zone that heats the study, and never put it back. I closed all the supply zone valves except the bad zone, and closed all the return valves except the bad zone and the study zone. I closed the main boiler output valve. At this point, the only path through the system was from the makeup water regulator, through the broken zone, to the return manifold, backwards into the study zone return pipe, through the cold side of the study zone mixing valve, and out the bottom flange of the not-present study pump.
I put a bucket under it and opened the bottom study zone pump valve. Water came out, but after a few gallons, I only get a trickle. I can hear hissing when I open the regulator toggle, but I suspect there is not enough flow to do effective purging. The setup is complicated, so I am not completely sure. In any case, this didn’t fix the not-working zone.
Next step: test the pressure regulator flow by closing all valves except makup water and the tap that is connected to the boiler outlet manifold. That will let me see the flow supplied by the regulator. I found an old backflow valve and regulator set on the floor. Evidently it was replaced at some point. The old one had a pretty clogged looking input screen, so perhaps that is the trouble with the current one as well. That wouldn’t affect normal operations because you don’t need makeup water unless there is a leak.

Bike Safey

I wrote this for the Wayland E News. I’m putting a copy here as well.

I’ve been biking to work. In Cambridge. Not that often, because I am not one of these spandex bikers, but a middle aged, somewhat overweight, t-shirt biker.
I just wanted to mention a few things that would help me survive the week.
I am eagerly awaiting a paved Wayland Rail Trail from the town Library through to Weston, but in the meantime, I bike along route 20. The problem is that few roads in Wayland are bike friendly, but you can help!
(About that rail trail, please see Wayland Rail Trail and check out the Minuteman Bike Trail from Lexington to Alewife or the Charles River Bike Path )
For my fellow residents:

Take a look at the street in front of your house or other property.
Keep the shoulders clear of debris, sand, leaves, sticks, broken glass, etc.
Try and deal with the poison ivy that loves the edges of roads. I am so allergic to that stuff that I don’t dare ride right at the edge.
If you have a sidewalk, please keep it clear. In addition to the debris, it is hard to navigate around those mailboxes and trash cans.

For our public works folks:

When we do have sidewalks, they tend to be pretty awful, and unusable for bicycles. The paving isn’t up to street standards, and is broken by roots, holes. etc.
The sidewalks tend to fill up with leaves, fallen branches, and so forth, which make them unusable.
Guy wires cross from utility poles at just the right hight to clothesline a tall guy like me. Of course they are invisible at dusk!
Many road corners lack curb cuts, so you can’t actually get on or off the sidewalk anyway.

Without sidewalks, I have to ride in the street. That is fine, but…

The shoulders are, um, badly paved: potholes, jagged gaps in the top paving, bumpy drains
The shoulders collect sand, which is like ice for bicycles, you can’t steer on sand.
On Route 20, there is an unfortunate amount of broken glass.

Maybe we could street sweep more than once a year?
And that paving on Pelham Island Road is nasty, but that is a topic for a different letter.
For Drivers:
Most drivers are actually pretty awesome with bicyclists, Thank you! However:

Look at that right side mirror once in a while. When you are caught in traffic, I will be passing you at my astounding 12 miles an hour or whatever. I’ll be coming up on your right.
Don’t keep so far to the right that there isn’t room for me! The lanes are actually fairly wide and the shoulder is often very narrow.

For my part, I signal, I don’t run red lights, and I really try to watch where I am going and to be aware of my surroundings, but not every cyclist (especially the kids) will follow the rules. Treat them with suspicion and when possible, give extra space when passing a cyclist, just in case they have no idea you are there and swerve to miss a stick or pothole.
-Larry

BIOS vs GPT

This might be the 1000th blog posting on this general topic, but for some reason, the complexity of booting grows year over year, sort of like the tax code.
Back in 2009, Win and I built three low power servers, using Intel D945GCLF2 mini-ITX motherboards with Atom 330 processors. We put mirrored 1.5 Terabyte drives in them, and 2 GB of ram, and they have performed very well as pretty low power home servers. We ran the then-current Ubuntu, and only sporadically ran apt-get update and apt-get upgrade.
Fast forward to this summer. We wanted to upgrade the OS’s, but they had gotten so far behind that apt-get update wouldn’t work. It was clearly necessary to reinstall. Now one of these machines is our compound mail server, and another runs mythtv and various other services. The third one was pretty idle, just hosting about a terabyte of SiCortex archives. In a previous blog post I wrote about the month elapsed time it took me to back up that machine.
This post is about the adventure of installing Ubuntu 12.04 LTS on it. (LTS is long term support, so that in principle, we will not have to do this again until 2017. I hope so!)
Previously, SMART tools were telling us that the 2009 era desktop 1.5T drives were going bad, so I bought a couple of 3T WD Red NAS drives, like the ones in our Drobo 5N. Alex (my 14 year old) and I took apart the machine and replaced the drives, with no problem.
I followed directions from the web on how to download an ISO and burn it to a USB drive using MacOS tools. This is pretty straightforward, but not obvious. First you have to convert the iso to a dmg, then use dd to copy it to the raw device:

hdiutil convert -format UDRW -o ubuntu-12.04.3-server-amd64.img ubuntu-12.04.3-server-amd64.iso
# Use diskutil list, then plug in a blank USB key >the image size, run diskutil list again to find the drive device.  (In my case /dev/disk2)
sudo dd if=ubuntu-12.04.3-server-amd64.img.dmg of=/dev/disk2 bs=1m
# notice the .dmg extension that MacOS insists on adding
diskutil eject /dev/disk2 (or whatever)

Now in my basement, the two servers I have are plugged into a USB/VGA monitor and keyboard switch, and it is fairly slow to react when the video signal comes and goes. In fact it is so slow that you miss the opportunity to type “F2” to enter the BIOS to set the boot order. So I had to plug in the monitor and keyboard directly, in order to enable USB booting. At least it HAS USB booting, because these machines do not have optical drives, since they have only two SATA ports.
Anyway, I was able to boot the Ubuntu installer. Now even at this late date, it is not really well supported to install onto a software RAID environment. It works, but you have to read web pages full of advice, and run the partitioner in manual mode.
May I take a moment to rant? PLEASE DATE YOUR WEB PAGES. It is astonishing how many sources of potentially valuable information fail to mention the date or versions of software they apply to.
I found various pieces of advice, plus my recollection of how I did this in 2009, and configured root, swap, and /data as software RAID 1 (mirrored disks). Ubuntu ran the installer, and… would not reboot. “No bootable drives found”.
During the install, there was an anomaly, in that attempts to set the “bootable” flag on the root filesystem partitions failed, and when I tried it using parted running in rescue mode, it would set the bootable flag, but clear the “physical volume for RAID” flag.
I tried 12.04. I tried 13.04. I tried 13.04 in single drive (no RAID). These did not work. The single drive attempt taught me that the problem wasn’t the RAID configuration at all.
During this process, I began to learn about GPT, or guid partition tables.
Disks larger than 2T can’t work with MBR (master boot record) style partition tables, because their integers are too small. Instead, there is a new GPT (guid partition table) scheme, that uses 64 bit numbers.
Modern computers also have something called UEFI instead of BIOS, and UEFI knows about GPT partition tables.
The Ubuntu installer knows that large disks must use GPT, and does so
Grub2 knows this is a problem, and requires the existence of a small partition flagged bios_grub, as a place to stash its code, since GPT does not have the blank space after the sector 0 boot code that exists in the MBR world (which grub uses to stash code).
So Ubuntu creates the GPT, the automatic partitioning step creates the correct mini-partition for grub to use, and it seems to realize that grub should be installed on both drives when using an MD filesystem for root. (it used the command line grub-install /dev/sda /dev/sdb) Evidently the grub install puts a first stage loader in sector 0, and the second stage loader in the bios_grub partition.
Many web pages say you have to set the “bootable” flag on the MD root, but parted will not let you do this,because in GPT, setting a “bootable” flag is forbidden by the spec. Not clear it would work anyway because when you set it, the “physical volume for raid” flag is turned off.
The 2009 Atom motherboards do not have a UEFI compatible BIOS, and are expecting an MBR. When they don’t find one, they give up. If they would just load the code in sector 0 and jump to it it would work. I considered doing a bios update, but it wasn’t clear the 2010 release is different in this respect.
So the trick is to use FDISK to <create an MBR> with a null partition. This is just enough to get past the Atom BIOS’ squeamishness and have it execute the grub loader, which then works fine using the GPT. I got this final trick from http://mjg59.dreamwidth.org/8035.html whose final text is

boot off a live CD and run fdisk against the boot disk. It’ll give a bunch of scary warnings. Ignore them. Hit “a”, then “1”, then “w” to write it to disk. Things ought to work then.

The sequence of steps that worked is:

Run the installer
Choose manual disk partitioning
Choose "automatically partition" /dev/sda
This will create a 1 MB bios_grub partition and a 2GB swap, and make the rest rootDelete the root partition
Create a 100 GB partition from the beginning of the free space
Mark it "physical volume for RAID" with a comment that it is for root 
Use the rest of the free space (2.9T) to make a partition, mark it physical volume for raid.  Comment that it is for /data
Change the type of the swap partition to "physical volume for raid"
Repeat the above steps for /dev/sdb
Run "configure software RAID"
Create MD volume, using RAID 1 (mirrored)
Select 2 drives, with 0 spares
Choose the two swap partitions
Mark the resulting MD partition as swap 
Create MD volume, RAID 1, 2, and 0
Select the two 100 GB partitions
Mark them for use as EXT4, to be mounted on /
Create MD volume, RAID 1, 2, and 0
Select the two 2.9T partitions
Mark them for use as EXT4, to be mounted on /data 
(I considered BTRFS, but the most recent comments I could find still seem to regard it as flakey)
Save and finish installing Ubuntu
Pretend to be surprised when it won't boot.  "No bootable disks found"
Reboot from the installer USB, choose Rescue Mode
Step through it. Do not mount any file systems, ask for a shell in the installer environment.
When you get a prompt,
fdisk /dev/sda
a
1
w
Then
fdisk /dev/sdb
a
1
w
^d and reboot. Done
Now I have a working Ubuntu 12.04 server with mirrored 3T drives.

Big Data

I propose a definition of Big Data. Big Data is stuff that you cannot process within the MTBF of your tools.
Here’s the story about making a backup of a 1.1 Terabyte filesystem with several million files.
A few years ago, Win and I built a set of home servers out of mini-ATX motherboards with Atom processors and dual 1.5 Terabyte drives. We built three, one for Win’s house, that serves as the compound IMAP server and such like, one for my house, which mostly has data and a duplicate DHCP server and such like, and one, called sector9, which has the master copy of the various open source SiCortex archives.
These machines are so dusty that it is no longer possible to run apt-get update, and so we’re planning to just reinstall more modern releases. In order to do that, it is only prudent to have a couple of backups.
In the case of sector9, it has a pair of 1.5 T drives set up as RAID 1 (mirrored). We also have a 1.5T drive in an external USB case as a backup device. The original data is still on a 1T external drive, but with the addition of this and that, the size of sector9’s data had grown to 1.1T.
I decided to make a new backup. We have a new Drobo5N NAS device, with 3 3T drives, set up for single redundancy, giving it 6T of storage. Using 1.1T for this would be just fine.
There have been any number of problems.
Idea 1 – mount the Drobo on sector9 and use cp -a or rsync to copy the data
The Drobo supports only AFP (Apple Filesharing Protocol) and CIFS (Windows file sharing). I could mount the Drobo on sector9 using Samba, except that sector9 doesn’t already have Samba, and apt-get won’t work due to the age of the thing.
Idea 2 – mount the Drobo on my Macbook using AFP, and mount sector9 on the Macbook using NFS.
Weirdly, I had never installed the necessary packages on sector9 to export filesystems using NFS.
Idea 3 – mount the Drobo on my Macbook using AFP and use rsync to copy files from sector9.
This works, for a while. The first attempt ran at about 3 MB/second, and copied about 700,000 files before hanging, for some unknown reason. I got it unwedged somehow, but not trusting the state of everything, rebooted the Macbook before trying again.
The second time, rsync took a couple of hours to figure out where it was, and resumed copying, but only survived a little while longer before hanging again. The Drobo became completely unresponsive. Turning it off and on did not fix it.
I called Drobo tech support, and they were knowledgeable and helpful. After a long sequence of steps, invoving unplugging the drives, and restarting the Drobo without the mSata SSD plugged in, we were able to telnet to it management port, but the Drobo Desktop management application still didn’t work. That was in turn resolved by uninstalling and reinstalling Drobo Desktop (on a Mac! Isn’t this disease limited to PCs?)
At this point, Drobo tech support asked me to use the Drobo Desktop feature to download the Drobo diagnostic logs and send them in….but the diagnostic log download hung. Since the Drobo was otherwise operational, we didn’t pursue it at the time. (A week later, I got a followup email asking me if I was still having trouble, and this time the diagnostic download worked, but the logs didn’t show any reason for the original hang.)
By the way, while talking to Drobo tech support, I discovered a weath of websites that offer extra plugins for Drobos (which run some variant of linux or bsd). They include an nfs server, but using it kind of voids your tech support, so I didn’t
A third attempt to use rsync ran for a while before mysteriously failing as well. It was clear to me that while rsync will synchronize two filesystems, it might never finish if it has to check its work from the beginning and doesn’t last long enough to finish.
I was also growing nervous about the second problem with the Drobo, that it uses NTFS, not a a linux filesystem. As such, it was not setting directory dates, and was spitting warnings about symbolic links. Symbolic links are supposed to work on the Drobo. In fact, I could use ln -s in a Macbook shell just fine, but what shows up in a directory listing is subtly different than what shows up in a small rsync of linux symbolic links.
Idea 4: Mount the drobo on qadgop (my other server, which does happen to have Samba installed) and use rsync.
This again failed to work for symbolic links, and a variety of attempts to change the linux smb.conf file in ways suggested by the Internet didn’t fix it. There were suggestions to root the Drobo and edit its configuration files, but again, that made me nervous.
At this point, my problems are twofold:

How to move the bits to the Drobo
How to convince myself that any eventual backup was actually correct.

I decided to create some end-to-end check data, by using find and md5sum to create a file of file checksums.
First, I got to wondering how healthy the disk drives on sector9 actually were, so I decided to try SMART. Naturally, the SMART tools for linux were not installed on sector9, but I was able to download the tarball and compile them from sources. Alarmingly, SMART told me that for various reasons I didn’t understand, both drives were likely to fail within 24 hours. They told me the external USB drive was fine. Did it really hold a current backup? The date on the masking tape on the drive said 5/2012 or something about a year old.
I started find jobs running on both the internal drives and the external:

find . -type f -exec md5sum {} ; >s9.md5
find . -typef -exec md5sum {} ; >s9backup.md5

These jobs actually ran to completion in about 24 hours each. I now had two files, like this:
root@sector9:~# ls -l *.md5 -rw-r--r-- 1 root root 457871770 2013-07-08 01:24 s9backup.md5 -rw-r--r-- 1 root root 457871770 2013-07-07 21:39 s9.md5 root@sector9:~# wc s9.md5 3405297 6811036 457871770 s9.md5
This was encouraging, the files were the same length, but diffing 450 MB files is not for the faint of heart, expecially since find doesn’t enumerate them in the same order. I had to sort each file, then diff the sorted files. This took a while, but in fact the sector9 filesystem and its backup were identical. I resolved to use this technique to check any eventual Drobo backup. It also relieved my worries that the internal drives might fail at any moment. I also learned that the sector9 filesystem had about 3.4 million files on it.
Idea 5: Create a container file on the Drobo, with an ext2 filesystem inside, and use that to hold the files.
This would solve the problem of putting symbolic links on the Drobo filesystem (even though it is supposed to work!) It would also fix the problem of NTFS not supporting directory timestamps or linux special files. I was pretty sure there would be root filesystem images in the sector9 data for the SiCortex machine and for its embedded processors, and I would need special files.
But how to create the container file? I wanted a 1.2 Terabyte filesystem, slightly bigger than the actual data used on sector9.
According to the InterWebs, you use dd(1), like this:
dd if=/dev/zero of=container.file block=1M seek=1153433 count=0
I tried it:
dd if=/dev/zero of=container.file block=1M seek=1153433
It seemed to take a long time, so I thought probably it was creating a real file, instead of a sparse file, and went to bed.
The next morning it was still running.
That afternoon, I began getting emails from the Drobo that I should add more drives, as it was nearly full, then actually full. Oops. I had left off the count=0.
Luckily, deleting a 5 Terabyte file is much faster than creating one! I tried again, and the dd command with count=0 ran very quickly.
I thought that MacOS could create the filesystem, but I couldn’t figure out how. I am not sure that MacOS even has something like the linux loop device, and I couldn’t figure out how to get DiskUtility to create a unix filesystem in an image file.
I mounted the Drobo on qadgop, using Samba, and then used the linux loop device to give device level access to the container file, and I was able to mkfs an ext2 filesystem on it.
Idea 6: Mount the container file on MacOS and use rsync to write files into it.
I couldn’t figure out how to mount it! Again, MacOS seems to lack the loop device. I tried using DiskUtility to pretend my container file was a DVD image, but it seems to have hardwired the notion that DVDs must have ISO filesystems.
Idea 7: Mount the Drobo on linux, loop mount the container, USB mount the sector9 backup drive.
This worked, sort of. I was able to use rsync to copy a million files or so before rsync died. Restarting it got substantially further, and a third run appeared to finish.
The series of rsyncs took several couple of days to run. Sometimes they would run at about 3 MB/s, and sometimes at about 7 MB/sec. No idea why. The Drobo will accept data at 11 MB/sec using AFP, so perhaps this was due to slow performance of the USB drive. The whole copy took close to 83 hours, as calculated by 1.1 T at 3 MB/sec.
Unfortunately, df said the container filesystem was 100% full and the final rsync had errors “previously reported” but scrolled off the screen. I am pretty sure the 100% is a red herring, because linux likes to reserve 10% of space for root, and the container file was sized to be more than 90% full.
I reran the rsync, under a script(1) to get a log file, and found many errors of the form “can’t mkdir <something or other>”.
Next, I tried mkdir by hand, and it hung. Oops. Ps said it was stalled in state D, which I know to be disk wait. In other words, the ext2 filesystem was damaged. By use of kill -9 and waiting, I was able to unmount the loop device and Drobo, and remount the Drobo.
Next, I tried using fsck to check the container filesystem image.
fsck takes hours to check a 1.2T filesystem. Eventually, it started asking me about random problems and could I authorize it to fix them? After typing “y” a few hundred times, I gave up and killed the fsck and restarted it fsck -p to automatically fix problems. Recall that I don’t actually care if it is perfect, because I can rerun rsync and check the final results using my md5 checksum data.
The second attempt to run fsck didn’t work either:
root@qadgop:~# fsck -a /dev/loop0 fsck 1.41.4 (27-Jan-2009) /dev/loop0 contains a file system with errors, check forced. /dev/loop0: Directory inode 54583727, block 0, offset 0: directory corrupted
/dev/loop0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
Hoping that the fsck -a had fixed most of the problems, I ran it a third time again without -a, but I wound up typing ‘y’ a few hundred more times. fsck took about 300 minutes of CPU time on the Atom to do this work and left 37 MB worth of files and directories in /lost+found.
With the container filesystem repairs, I started a fourth rsync, which actually finished, transferring another 93 MB.
Next step – are the files really all there and all the same? I’ll run the find -exec md5sum to find out.
Um. Well. What does this mean?
root@qadgop:~# wc drobos9.md5 s9.md5
3526171 7052801 503407914 drobos9.md5
3405297 6811036 457871770 s9.md5
The target has 3.5 million files, while the source has 3.4 million files! That doesn’t seem right. An hour of running “du” and comparing the top few levels of directories shows that while rerunning rsync to finish interrupted copies works, you really have to use the same command lines. I had what appeared to be a complete copy one level below a partial copy. After deleting the extra directories, and using fgrep and sed to rewrite the path names in the file of checksums, I was finally able to do a diff of the sorted md5sum files:
Out of 3.4 million files, there were 8 items like this:
< 51664d59ab77b53254b0f22fb8fdb3a8 ./sicortex-archive/stash/97/sha1_97e18c8e2261b09e21b0febd75f61635d7631662_64088060.bin
—
> 127cc574dcb262f4e9e13f9e1363944e ./sicortex-archive/stash/97/sha1_97e18c8e2261b09e21b0febd75f61635d7631662_64088060.bin
1402503c1402502
and one like this:
> 8d9364556a7891de1c9a9352e8306476 ./downloads.sicortex.com/dreamhost/ftp.downloads.sicortex.com/ISO/V3.1/.SiCortex_Linux_V3.1_S_Disk1_of_2.iso.eNLXKu
The second one is easier to explain, it is a partially completed rsync, so I deleted it. The other 8 appear to be files that were copied incorrectly! I should have checked the lengths, because these could be copies that failed due to running out of space, but I just reran rsync on those 8 files in –checksum mode.
Final result: 1.1 Terabytes and 3.4 million files copied. Elapsed time, about a month.
What did I learn?

Drobo seems like a good idea, but systems that ever need tech support intervention make me nervous. My remaining worry about it is proprietary hardware. I don’t have the PC ecosystem to supply spare parts. Perhaps the right idea is to get two.
Use linux filesystems to hold linux files. It isn’t just Linus’ and his files that vary only in capitalization, it is also the need to hold special files and symlinks. Container files and loop mounting works fine.
Keep machines updated. We let these get so far behind that we could no longer install new packages.
A meta-rsync would be nice, that could use auxiliary data to manage restarts.
Filesystems really should have end-to-end checksums. ZFS and BTRFS seem like good ideas.
SMB, or CIFS, or the Drobo, or AFP are not good at metadata operations, it was a fail to try writing large numbers of individual files on the Drobo, no matter how I tried it. SMB read write access to a single big file seems to be perfectly reliable.

wget

I am struggling here decide whether the Bradley Manning proscecutors are disingenuous or just stupid.
I am reacting here to Cory Doctorow’s report that the government’s lawyers accuse Manning of using that criminal spy tool wget.
Notes from the Ducking Stool
I am hoping for stupid, because if they are suggesting to the jury facts they know not to be true, then that is a violation of ethics, their oaths of office, and any concept of justice.
Oh, and wget is exactly what I used, the last time I downloaded files from the NSA.
Really.
A while back, the back issues of the NSA internal newsletter Cryptolog were declassified so I downloaded the complete set. I think the kids are puzzled about why I never mind having to wait in the car for them to finish something or other, but it is because I am never without a large collection of fascinating stuff.
Here’s how I got them, after scraping the URLs out of the agency’s HTML:
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_01.pdf wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_02.pdf wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_03.pdf wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_04.pdf
. . .
wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_132.pdf wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_133.pdf wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_134.pdf wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_135.pdf wget http://www.nsa.gov/public_info/_files/cryptologs/cryptolog_136.pdf

Email Disaster Recovery and Travel adventures

Cathy is off to China for a few weeks. She wanted email access, but not with her usual laptop.
She uses Windows Vista on a plasticy HP laptop from, well, the Vista era. It is quite heavy, and these days quite flaky. It has a tendency to shut down, although not for any obvious reason, other maybe than age, and being Vista running on a plasticy HP laptop.
I set up the iPad, but Cathy wanted a more familiar experience, and needed IE in order to talk to a secure webmail site, so we dusted off an Asus EEE netbook running Windows XP.
I spent a few hours trying to clear off several years off accumulated crapware such as three different search toolbars attached to Internet Explorer, then gave up and re-installed XP from the recovery partition. 123 Windows Updates later, it seemed fine, but still wouldn’t talk to the webmail site. It turns out that Asus thoughtfully installed the open source local proxy server Privoxy, with no way to uninstall it. If you run the Privoxy uninstall, it leaves you with no web access at all. I finally found Interwebs advice to also uninstall the Asus parental controls software, and that fixed it.
Next, I installed Thunderbird, and set it up to work with Cathy’s account on the family compound IMAP server. I wanted it to work offline, in case of spotty WiFi access in China, but after setting that up, so I “unsubscribed” to most of the IMAP folders and let it download. Now Cathy’s inbox has 34,000 messages in it, and I got to thinking “what about privacy?” After all, governments, especially the United States, claim the right to search all electronic devices at the border, and it is also commonly understood that any electronic device you bring to China can be pwned before you come back.
Then I found a setting that tells Thunderbird to download only the last so many days for offline use. Great! But it had already downloaded all 6 years of back traffic. Adjacent, there is a setting for “delete mail more than 20 days (or whatever) old.”
You know what happens next! I turned that on, and Thunderbird started deleting all Cathy’s mail, both locally and on the server. Now there is (farther down the page), fine print that explains this will happen, but I didn’t read it.
Parenthetically, this is an awful design. It really looks like a control associated with how much mail to keep for offline use, but it is not. It is a dangerous, unguarded, unconfirmed command that does irreversible damage.
I thought this was taking too long, but by the time I figured it out, it was way too late.
So, how to recover?
I have been keeping parallel feeds from Cathy’s email, but only since March or so, since I’ve been experimenting with various spam supression schemes.
I had made a copy of Cathy’s .maildir on the server, but it was from 2011.
But wait! Cathy’s laptop was configured for offline use, and had been turned off. Yes! I opened the lid and turned off WiFi as quickly as possible, before it had a chance to sync. (Actually, the HP has a mechanical switch to turn off WiFi, but I didn’t know that.) I then changed the username/password on her laptop Thunderbird to stop further syncing.
Next, since the horse was well out of the barn, I made a snapshot of the server .maildir, and of the HP laptop’s Thunderbird profile directories. Now, whatever I did, right or wrong, I could do again.
Time for research!
What I wanted to do seemed clear: restore the off-line saved copies of the mail from the HP laptop to the IMAP server. This is not a well-travelled path, but there is some online advice:
http://www.fahim-kawsar.net/blog/2011/01/09/gmail-disaster-recovery-syncing-mail-app-to-gmail-imap/
https://support.mozillamessaging.com/en-US/kb/profiles
The general idea is:

Disconnect from the network
Make copies of everything
While running in offline mode, copy messages from the cached IMAP folders to “Local” folders
Reconnect to the network and sync with the server. This will destroy the cached IMAP folders, but not the new Local copies
Copy from the Local folders back to the IMAP server folders

Seems simple, but in my case, there were any number of issues:

Not all server folders were “subscribed” by Thunderbird, and I didn’t know which ones were
The deletion was interrupted at some point
I didn’t want duplicated messages after recovery
INBOX was 10.3 GB (!)
The Thunderbird profile altogether was 23 GB (!)
The HP laptop was flakey
Cathy’s about to leave town, and needs last minute access to working email

One thing at a time.
Tools
I found out about “MozBackup” and used it to create a backup copy of the HP laptop’s profile directory.
MozBackup
MozBackup creates a zip file of the contents of a Thunderbird profile directory, and can restore them to a different profile on a different computer, making configuration changes as appropriate. This is much better than hand editting the various Thunderbird configuration files.
Hardware problems
As I mentioned, the HP laptop is sort of flakey. I succeeded in copying the Thunderbird profile directory, but 23 GB worth of copying takes a long time on a single 5400 rpm laptop disk. I tried copying to a Mybook NAS device, but it was even slower. What eventually worked, not well, but adequately, was copying to a 250GB USB drive.
I decided to leave the HP out of it, and to do the recovery on the netbook, the only other Windows box available. I was able to create a second profile on the netbook, and restore the saved profile to it, slowly, but I realized Cathy would leave town before I finished all the steps, taking the netbook with her. Back to the HP.
First I tried just copying the IMAPMail subfolder files of mbox files and msf files to LocalFolders. This seemed to work, but Thunderbird got very confused about it. It said there were 114000 messages in Inbox, rather than 34000. This shortcut is a dead end.
I created a new profile on the HP, and restored the backup using MozBackup (which took 2 hours), and started it in offline mode. I then tried to “select-all” in Inbox to copy them to a local folder. Um. No. I couldn’t even get control back. Thunderbird really cannot select 34000 messages and do anything.
Because I was uncertain about the state of the data, I restored the backup again (another 2 hours).
This time, I decided to break up Inbox into year folders, each holding about 7000 messages. The first one worked, but then the HP did an undexpected shutdown during the second, and when it came back, Inbox was empty! The Inbox mbox file had been deleted.
I did another restore, and managed to create backup files for 2012 and 2011 messages, before it crashed again. (And Inbox was gone AGAIN)
The technique seemed likely to eventually work, but it would drive me crazy. Or crazier.
I was now accumulating saved Local Folder files representing 3 specific years of Inbox. I still had to finish the rest, deal with Sent Mail, and audit about 50 other subfolders to see if they needed to be recovered.
I wasn’t too worried about all the archived subfolders, since they hadn’t changed in ages and were well represented by my 2011 copy of Cathy’s server .maildir
Digression
What about server backups? Embarassing story here! Back in 2009, Win and I built some nice mini-ATX atom based servers with dual 1.5T disks run in mirrored mode for home servers. Win’s machine runs the IMAP, and mine mostly has data storage. Each machine has the mirrored disks for reliabiltiy and a 1.5T USB drive for backup. The backups are irregularly kept up to date, and in the IMAP machines case, not recently.
About 6 months ago, I got a family pack of CrashPlan for cloud backup, and I use it for my Macbook and for my (non IMAP) server, but we had never gotten around to setting up CrashPlan for either Cathy’s laptop or the IMAP server.
A few months ago, we got a Drobo 5N, and set it up with 3 3T disks, for 6T usable storage, but we haven’t gotten it working for backup either. (I am writing another post about that.)
So, no useful server backups for Cathy’s mail.
Well now what?
I have a nice Macbook Pro, unfortunately, the 500 GB SSD has 470 GB of data, not enough for one copy of Cathy’s cached mail, let alone two. I thought about freeing up space, and copied a 160 GB Aperture photo library to two other systems, but it made me nervous to delete it from the Macbook.
I then tried using Mac Thunderbird to set up a profile on that 250 GB external USB drive, but it didn’t work because the FAT filesystem couldn’t handle Mac Thunderbird’s need for fancy filesystem features like ACLs, but this triggered an idea!
First, I was nervous about using Mac Thunderbird to work on backup data from a PC. I know that Thunderbird profile directories are supposed to be cross-platform, but the config files like profile.ini and prefs.js are littered with PC pathnames.
Second, the USB drive is slow, even if it worked.
Up until recently, I’ve been using a 500 GB external Firewire drive for TimeMachine backups of the Macbook. It still was full of Time Machine data, but I’ve switched to using a 1T partition on the Drobo for TimeMachine. I also have the CrashPlan backup. So I reformatted the Firewire Drive to HFS, and plugged it in as extra storage.
Also on the Macbook, is VMWare Fusion, and one of my VMs is a 25 GB instance running XP Pro.
I realized I should be able to move the VM to the Firewire drive, and expand its storage by another 50 GB or so to have room to work on the 23 GB Thunderbird data.
To the Bat Cave!
It turns out to be straightforward to copy a VMWare image to another place, and then run the copy. Rather than expand the 25GB primary disk, I just added a second virtual drive and used XP Disk management to format it as drive E. I also used VMWare sharing to share access to the underlying Mac filesystem on the Firewire drive.

Copy VMWare image of XP to the Firewire drive
Copy MozBackup save file of the cached IMAP data and the various Local Files folders to the drive
Create second disk image for XP
Run XP under VMWare Fusion on the Macbook, using the Firewire drive for backing store
Install Thunderbird and MozBackup
Use Mozbackup to restore Cathy’s cached local copies of her mail from the flakey HP laptop
Copy the Local Files mailbox files for 2013, 2012, and 2011 into place.
Use XP Thunderbird running under VMWare to copy the rest of the cached IMAP data into Local Folders.
By hand, compare message counts of all 50 or so other IMAP folders in the cached copy with those still on the server, and determine they were still correct.
Go online, letting Thunderbird sync with the server, deleting all the locally cached IMAP data.
Create IMAP folders for 2007 through 2013, plus Sent Mail and copy the roughly 40000 emails back to the server.

Notes
During all of this, new mail continued to arrive into the IMAP server, and be accessible by the instance of Thunderbird on the netbook.
A copy of Cloudmark Desktop One was active running on the Macbook using Mac Thunderbird to do spam processing of arriving email in Cathy’s IMAP account.
My psyche is scarred, but I did manage to recover from a monstrous mistake.
Lessons

RAID IS NOT BACKUP

The IMAP server was reliable, but it didn’t have backups that were useful for recovery.

Don’t think you understand what a complex email client is going to do

Don’t experiment with the only copy of something! I should have made a copy of the IMAP .maildir in a new account, and then futzed with the netbook thunderbird to get the offline use storage the way I wanted.

Quantity has a quality all its own.

This quote is usually about massive armies, but in this case, the very large email (23 GB) just made the simplest operations slow, and some things (like selecting ALL in a folder with 34000 messages, impossible.) I had to go through a lot of extra work because various machines didn’t have enough free storage, and had other headaches because the MTBF of the HP laptop was less than the time to complete tasks.
-Larry