February 2009 – A Rubble of Bits

Fun with the C Preprocessor

Recently I’ve been writing an implementation of OpenSHMEM for the SiCortex platform. OpenSHMEM is a communications API that lets you write PGAS programs in C or FORTRAN. PGAS stands for “partitioned global address space.” A PGAS program is a parallel program that can run on a lot of cores or cluster nodes simultaneously. Each processing element (PE) can read and write the address spaces of the other PEs, but the program “knows” that the global address space is partitioned into a bunch of local address spaces. This is either taken care of by the language, like in UPC (unified parallel C) or by the programmer explicitly using something like OpenSHMEM.
All this is just introduction. OpenSHMEM derives from the Cray/SGI SHMEM API, and one of the things it has are a lot of API calls that differ only in the datatypes of their arguments.
For example, OpenSHMEM has

extern void shmem_short_wait(short *var, short value);
extern void shmem_int_wait(int *var, int value);
extern void shmem_long_wait(long *var, long value);
extern void shmem_longlong_wait(longlong *var, longlong value);
all of which wait for a local variable to be set by some remote PE.

I am a lazy programmer. I should write the same routine four times? Instead, I used some C preprocessor magic and wrote this:

#define EmitWait(type)
void shmem_##type##_wait(type *var, type value)
{
while (*((volatile type *) var) == value) shmem_progress();
}
EmitWait(short);
EmitWait(int);
EmitWait(long);
EmitWait(longlong);
This uses the “token pasting” feature of CPP to write the proper function names for the different versions of the routine.
A bit later, I learned of the Global Address Space Performance initiative, a project at the University of Florida. By putting the performance analyzer library in front of the OpenSHMEM library on your search path, you can instrument the communications functions in your program without recompilation. This works via dynamic linking. The program calls shmem_long_wait, which is intercepted by the performance library. The library does whatever it does, then passes the call to pshmem_long_wait, which is provided by the OpenSHMEM implementation.
You can do this by providing two OpenSHMEM implementations, one with the names like shmem_long_wait, which is used when not profiling, and one with names like pshmem_long_wait, which is used when you are. Alternatively, you can use the “weak symbol” feature of the GNU runtime. A weak symbol defines something, but it doesn’t complain if an alternative definition is present in the address space. To make this work, you write all the functions as pshmem_long_wait, then add a weak symbol definition for the standard versions, like this:

#pragma “weak shmem_long_wait=pshmem_long_wait”

Now everything is in one library, and there is no performance penalty when you aren’t using the instrumentation library.
Well the obvious way for the lazy programmer to do this is like this:

#define EmitWait(type)
#pragma “weak shmem_##type##_wait=pshmem_##type##_wait”
void shmem_##type##_wait(type *var, type value)
{
while (*((volatile type *) var) == value) shmem_progress();
}
but this fails because you can’t use the C preprocessor to write C preprocessor items like #pragma. No problem! C99 provides an alternate version of pragma exactly for this reason. The GNU info file says:

C99 introduces the `_Pragma’ operator. This feature addresses a major problem with `#pragma’: being a directive, it cannot be produced as the result of macro expansion. `_Pragma’ is an operator, much like `sizeof’ or `defined’, and can be embedded in a macro.

Now I can write my macro with

_Pragma(“weak shmem_##type##_wait=pshmem_##type##_wait”)

right? Well, no. token pasting doesn’t work inside strings! You can’t build up string constants this way. No problem! GCC automatically concatenates adjacent string constants into a single longer string. This was originally done so you can avoid line wrapping, but whatever. I can write this

_Pragma(“weak shmem_” #type “_wait=pshmem_” #type “_wait”)

This is using a different preprocessor feature, called “stringification” in which #type is expanded and turned into a string constant.
Unfortunately, this doesn’t work either, because _Pragma is processed earlier in the compiler than other uses of string constants, and before the string concatenation happens. _Pragma has to have exactly one string constant as an argument.
How much time have I spent on this? How many cases of _Pragma do I have to write by hand? I give up. The final version is

#define EmitWait(type)
void pshmem_##type##_wait(type *var, type value)
{
while (*((volatile type *) var) == value) shmem_progress();
}
_Pragma(“weak shmem_short_wait=pshmem_short_wait”)
_Pragma(“weak shmem_int_wait=pshmem_int_wait”)
_Pragma(“weak shmem_long_wait=pshmem_long_wait”)
_Pragma(“weak shmem_longlong_wait=pshmem_longlong_wait”)
_Pragma(“weak shmem_wait=pshmem_wait”)
EmitWait(short);
EmitWait(int);
EmitWait(long);
EmitWait(longlong);

It has more typing than ought to be necessary, but I got over it.

Hotel WiFi

Internet access is free at cheap hotels and costly at expensive hotels. Costly in money or inconvenience. What I want from hotels is a friction free experience, and mostly that is easier at mid-range hotels like, say, Hampton Inn, than it is at “better” hotels.
Case in point – the Omni Austin. I stayed there during the Supercomputing 2008 conference, and internet access is $9.99 per day or free for members of the frequent sleepers club. The system intercepts your web access until you log in by entering your frequest guest number or agreeing to the charge. The user experience is hideous.
Here’s what I wrote to the hotel:

The WiFi service at the Austin downtown is dreadful.

Slow, poor reception on 10th floor
Horrible signin system – The wifi being slow, but the web pages you go through to log in display very slowly
The promise of “returning you to the page you wanted” is a lie, you get sent to the Hotel home page. I do not want to look at your slow home page!
Daily login is incredibly annoying. After leaving my laptop on the desk, running, I get back and all of the network services and email have stopped working, due to your wifi cutting me off in the middle of my session.
The registration stuff makes it impossible to use my iPhone via wifi, because the login pages are too slow and too complicated and too tiny fonts to work on the small screen.

The design is just wrong anyway, because other net services like email simply fail to work, rather than giving any explanation. Then I have to guess that I have to use the web browser to click through your slow pages before I can read my mail. Personally, I think having paid wifi is counterproductive – much cheaper hotels just have free wifi. The cost to you is negligible, and the annoyance to your customers (me) is just stupid. I would rather stay in a Hampton Inn than an Omni. Instead, you raise your cost structure by having all this registration crap, and irritate the paying guests.

I got a letter back, from Gene McMenamin, General Manager of the Omni Austin Downtown

He offers his sincere apologies and says they are currently upgrading this service “to improve our connectivity to better accommodate the needs of our guests.”

I hope you get it right, Mr McMenamin.

Let’s go over it, step by step.

Internet access is not quite too cheap to meter, but it is close. ANY impediment to access in the name of cost recovery will reflect negatively on the hotel.
People who don’t use web email services, but use POP or IMAP email services, cannot use the network access until they remember that it won’t work until you use the web to click through.
Don’t charge or intercept, but if you must, test it yourself to see how fast it is. A slow set of hard to read pages will just reflect negatively on you.
My iPhone will try to use your “free” wifi, but it will fail silently. Even if I try, the signon pages are hopeless on a handheld device.

In contrast, just putting in a free system has many benefits:

It is painless for the guests. Put the hotel name in the wifi ID, leave it at that.
It works for all devices
It works for all services, web or not
It works for business meetings
It works for visitors to your coffee shop and bar

Internet access that is really friction free as well as free as in beer, leaves me a happy customer. What I remember about hotels with bad internet experience is the same as what I remember about restaurants with slow service. The bad experience has destroyed every other good thing you’ve done.

Why do hotels shoot themselves in the foot like this?

Sometimes it is because they were forward looking, and installed wired internet years ago. All that stuff still isn’t paid for, while hotels who waited just dropped in a few access points. In a way this is the same story as cell phones – the US was early, and as a consequence, we have a junky system by world standards. These sunk costs are really an accounting problem, but instead of just writing them off, operators are driven by the bean counters to keep bad systems in place until their erroneous estimates of useful life are used up.

Another reason is corporate. The management fell victim to a slick salesman from a wifi accesspoint company, so they signed a contract for paid service, and they are stuck with it.

The worst reason is marketing, and I think that is the problem at the Omni. It is $10 a day, which they think is cheap, so they make it free for frequent guests. Well, it isn’t free, it takes minutes of inconvenience for every user every day, and whose name is on the page they didn’t want? Why “Omni”! Good going. At least this is the easiest problem to fix.

Icebergs on the Assabet

I was on my way back from lunch at Thai Chilli’s to the SiCortex office in the Mill in Maynard, MA. On the way across the bridge over the Assabet I noticed ice in the water, flowing downstream.

Here’s another just coming out from under the bridge.

ice2 — Another one, from under the bridge

Getting at old RAID sets

After I got Ubuntu 8.10 working with MD RAID Sata drives, I wanted to move my old data onto the new drives. As I explained previously, the system would not boot with the old drives plugged into the primary IDE controller and the CD and extra drive plugged into the secondary IDE. It would boot with the old raid set plugged into the secondary IDE and the primary left unused.
Now, running on the Sata drives, I wanted to access the old drives, which were set up as a number of MD Raid-1 sets. After the break I’ll explain step by step how to find and mount the old raid sets.
Continue reading “Getting at old RAID sets”

The Great Sata Raid

I ran out of space on my home Linux machine. I suppose it started out life as a 350 MHz Pentium II in about 1998, but I think everything has been replaced since then. The current system is a dual processor (two cores total) Athlon 1.8 GHz machine with 1 GB memory. It has a Tyan Tiger MPX motherboard in an Antec Sonata case and a Matrox G550 graphics card. I run Gentoo Linux on it.
The disk system is software RAID-1 (mirrored disks) on a pair of 250 GB drives. These are on the primary IDE port. A third 250 GB drive and a DVD burner are on the secondary IDE.
I started reading in an accumulated pile of Sony MicroMV tapes (mpeg2) from the family camcorder because I am nervous that the camcorder is getting flaky. On tape 20, the disks filled up. The single drive is mostly full of other video (like 150 odd episodes of Good Eats pulled off the Replay TV with DVArchive.
All this is merely setup about why I have been trying to upgrade the storage. My colleague Jeff Darcy solved a similar problem with a QNAP NAS box, but I am kind of cheap and stubborn. Instead, I got a Syba SD-SATA2-2E2I SATA card and a pair of 1 TB Western Digital drives from Newegg. Oh, and I decided to give Ubuntu a try.
Step 1 – Get the controller to work
The first controller was DOA. This is always a little frightening, did I break it? But not this time. I know how to handle the cards, use good anti-static protocols, and so forth. This card, when plugged into any PCI slot, prevented the machine from booting at all. No beeps, no BIOS. Unplug, and it works. The Syva card is a little weird in that it has PCI slot cutouts for both 5 volt and 3.3 volt slots, but according to the Internet, it really only works in 5 volt slots. Fair enough, but dead. Newegg sent an RMA, and, eventually, a new card that behaved much better.
The next huge issue is BIOS extensions. I guess this is a good idea, so that random new controllers can come with the software to run them. The 2E2I comes with a fake RAID BIOS that is just a Bad Idea. I planned to not use it, and instead to use Linux MD software RAID.
A digression about storage policy.
I’d basically like to save everything forever. I still have files from my grad school days at Stanford. We had a PDP-11/34 running V7 Unix. Since then I’ve been through four companies and a variety of mail system. I have nested archives, typically underneath trees of directories where each level is named “old stuff” or the equivalent. During my entire career, this stuff has been too large to easily put on off-line media, be it floppies, tapes, CDs, ZIP drives, or whatever. In any case, even if you believe the media are stable, and I don’t, the devices to read old media quickly become unavailable. I think the only real solution is to keep rolling your data forward onto new online devices.
Next. I am acutely aware of single points of failure. I am not really happy with only one of something, and I am not really happy with something that I can’t fix. With this home machine, everything is available commodity parts. I’ve had the memory break, the fans break, the power supply break, the graphics card break, and these are petty annoyances, but my data isn’t really at risk. I can fix the hardware, or if necessary, put the drives into another PC. Appliance NAS boxes make me nervous. It isn’t that the drives are flaky or the software buggy, it is that the rest of the hardware isn’t repairable. It isn’t commodity. If something breaks, you have to hope the company is still in business and deal with slow and expensive repairs. Jeff’s QNAP at least can copy itself to another QNAP, but then you really need two, and they aren’t inexpensive.
So I have a RAID system built with commodity parts, and I back up that to another system that is similar. I’d kind of like the backup to be offsite. Cloud storage for offsite makes sense, but 250 GB to 1 TB is sort of unwieldy. One of the other reasons I have for choosing the Syba card is that it has eSata ports so I can occasionally make copies on 1 T external drives and leave them, say, at Grandma’s house.
Back to the hardware setup.
Since I don’t want the hardware RAID, I didn’t set it up. I learned from the Web that Ubuntu 8.10 can install onto RAID, but you have to use the Alternate installer. I followed directions, ignored installer complaints about an inability to reread partition tables, and got through the install…and it wouldn’t boot.
In fact, in the BIOS boot sequence screen, the Sata drives wouldn’t show up at all. According to the Internet, you can get around this by declaring one of the drives to be a concatenated drive with one sub-drive, but that did not work. I tried re-installing, thinking that setting up the concatenated drive scrambled grub, but that didn’t work. By the way, the Ubuntu installer should let you repeat individual steps, like installing grub, without wanting to repartition your drives. Just a thought.
Back to the web. Apparently, the controller has two BIOS flash images. The RAID version is standard, but you can reflash the card with the other image, with basically lets you use the controller in JBOD mode. The actual Sata chip is a Silicon Image 3124, and the flash images and installers are on their website. And only work on DOS or Windows, don’t get me started on that.
I plugged the card into an old Sony Vaio desktop we have for games, and downloaded the installer for Windows, and it didn’t work. It could not find the card at all. The Windows XP device manager knew about it, but the flash installer didn’t. I guessed that the XP driver needed to be installed, and the “check for new hardware” actually worked without requiring a reboot. That never happens. After the driver was running, the flash installer worked ok.
Back to my system, now the disks show up in the boot menu, but won’t boot. Time for another install. Same warning messages, same lack of booting. This time I applied a Principle of Debugging: Do not ignore errors you don’t understand. I send the literal error text to Google, and discovered it was a patch to the Ubuntu installer to prevent it from failing when it encountered remnants of old RAID metadata. My guess is that this stuff was left laying around on the disks by the RAID version of the controller firmware. After more web searching, I found that the dmraid Linux utility, among other things, has the power to delete such things. I got to dmraid by using the install CD to give me a shell on my installed not not working root file system. I erased the RAID junk, but the system still would not boot.
Next I guessed that this was grub not being correctly installed, quite, so I installed it manually, and wrong, so that now I have /boot/boot/<stuff> for some reason, but the system can now boot.
Copying old data
Next, I wanted to copy data from my old raid sets onto the new 960 gigabyte /home partition of the new ones. All this time, I had the primary IDE controller cable unplugged, so that there would be no way for the Ubuntu installer to erase my data. I plugged it in again, and the Sata controller disappeared! The old drives were now visible, and bootable, but once booted, lspci couldn’t see the new controller. Unplug the primary IDE cable and it came back. Weird. The CD drive and my other 250 GB drive on the secondary IDE were working with the Sata drives, but not the primary IDE.
Finally, I just unplugged the secondary, and plugged my old RAID drives into the secondary IDE connector on the motherboard. That worked, and now I have the new terabyte RAID system and my old data at the same time. As we speak, I am rsync’ing the contents onto the new drives.
It should go without saying. This is all way too complicated. And this isn’t really my primary computer anymore. Mostly I use a 17″ Macbook Pro. That I back up with an Apple Time Capsule with a 1 Terabyte drive. Yes, it doesn’t really meet my goals of repairability, but it just works.