At the day job, I’ve been writing a new version of nbd-client. Instead of handing an open tcp socket to the kernel, it hands the kernel one end of a unix domain socket and keeps the other end for itself. This creates a block device where the data is managed by a user mode program on the same system.
In regular nbd-client, the last thing the program does is call ioctl(fd, NBD_DO_IT), which doesn’t return. The thread is used by the device driver to read and write the socket without blocking other activity in the block layer.
Because I need the program around to do other work, I called pthread_create to make a thread to call the ioctl.
Then I ran my program under gdb (as root!).
In another window, I typed dd if=/dev/nbd0 bs=4096 count=1
In the gdb window I saw
nbd-userland.c:525: server_read_fn: Assertion `0′ failed.
and my dd hung, and the gdb hung, and neither could be killed by ^C
I was able to get control back by using the usual big hammer, kill -9 <gdb>
So what happened? My user mode thread hit an assertion, and gave control to gdb, which tried to halt the other threads in the process, which didn’t work because the thread in the middle of the ioctl was in the middle of something uninterruptible, and the gdb thread trying to do this also became uninterruptible while waiting.
It is going to be hard to debug this program like this.
The fix, however, is fairly clear: use fork(2) instead of pthread_create() to create a thread to call ioctl. It will be isolated from the part of the program hitting the assertion.
Older and wiser,
Larry
By the way, when you are trying to figure out where processes are stuck, look at the “wchan” field of ps axl. It will be a kernel symbol that will give you a clue about what the thread is waiting for.
UPDATE
Experience is what lets you recognize a mistake when you make it again.
The underlying bug was sending too much data on the wire. Like this:
struct network_request_header {
uint64_t offset;
uint32_t size;
};
write(fd, net_request, sizeof(struct network_request_header);
Well, no. sizeof(struct network_request_header) turns out to be 16, rather than, say, 12. If you think about it, this makes perfect sense, because otherwise an array of these things would have unaligned uint64_t’s every other time. You can’t do network I/O this way, especially if the program on the other end uses a different language or different compiler.
gdb, it turns out, has a feature: __attribute__((packed)) that makes this work, but it is not portable to other compilers.
Category: Computing
Home Networking Troubleshooting
Sometimes a technological scramble is triggered by the most mundane events. In this case, the season finale of “X Factor”.
Last night, there was a special church choir rehearsal for the Christmas Eve services, and all seven of Win’s and my kids went. Since the rehearsal would overlap the broadcast finale of X Factor, Erica asked Win to record it. Maybe the appearance of 1 Direction had something to do with it as well.
We used to have Replay TVs to solve things like this, and cable TV to deliver the bits, but the conversion to digital TV and the crazy anti-customer behavior of Comcast has changed all that. We don’t get cable, and the TV is hooked up to an antenna. We’ve also got a Silicon Dust HDHomeRun network tuner connected to the antenna on my front porch, so we can watch TV on any computer as well. Win has the copy of EyeTV that came with the HDHomeRun, and he planned to record the show.
About an hour before air time, he called to ask me about video artifacts and bad audio. I said I’d take a look.
I used hdhomerunner (a now lost Google Code project to develop an open source HDHomeRun control program) and directed the video to VLC running on my Macbook Pro. Indeed, the video was blocky and the audio spotty.
I power cycled the HDHomeRun, replaced the ethernet cable, and plugged it into a different switch port on the 16-port gigE switch. No change. I looked for firmware upgrades, and found the device running 4-year old firmware. The upgrade went smoothly, but there was no change in video quality.
After sitting and swiveling back and forth for a while, I went back downstairs and plugged the device into the 100 Mbps switch instead of the 1000 Mbps switch. I had some vague memory that the negotiation doesn’t always work right. This fixed the problem and I was able to watch good video and audio with VLC.
Win called back to report his video was still breaking up. This suggested some other networking problem between the houses.
Backgound. Win and I are neighbors, and we have a conduit between the houses with a couple of outdoor rated Cat V cables and a 6-fiber multimode fiber. One pair of fibers are connected to 1000base-SX media converters at the two ends and plugged into the house gigE switches.
I remembered once setting up netperf on the home servers, and indeed it was still installed. Win’s house to mine reported 918 Mbps, but mine to Win’s reported 16! At this point, there wasn’t much time to debug the networking, and X Factor was about to start.
I remembered that VLC can record an input video stream, and set that up to record the program on my Macbook. (I had 45 GB free on disk, and the program was running at 2 Megabytes/second, so it would take 14 GB for the two hours. No doubt there is a way to transcode, but not enough time to learn how to do it!)
The VLC recording froze once, at about the one hour point, but I only missed a couple of minutes. I copied the files to an external USB drive for sneakernet delivery.
This morning, Win and I started taking a look at the networking. First, we got netperf running on our respective Macbook and iMacs, in order to figure out if the link was bad or one of the home servers. I was able to talk both ways to my server at about 600 Mbps, and Win to his at about 95 Mbps. Win’s results are explained by a fast Ethernet hop somewhere, but all these rates are way above the pitiful 16 Mbps across the fiber.
Next Win wiggled his connectors, dropping the path to about 6 Mbps. We swapped the transmit and receive fibers at both ends, and the direction of the problem did not change. It was looking more and more like a bad media converter.
I was staring at the wiring in my basement, wondering if we could use the copper link as backup while waiting for parts. It never worked very well, but we did use it to cross connect our DMZs before the firewalls at one point. I found the cable, and found it plugged into the ethernet switch on the back of my FIOS router – with LINK active! Huh? What was it plugged into at Win’s end? He reported it plugged into a small switch, but that it wasn’t easy to tell what else was plugged in.
For experiment, we unplugged the copper link and … Win lost Internet access. Evidently (a) his routes were set to use the Serissa business FIOS rather than his home Comcast, and (b) the traffic was going over this moldy waterlogged CatV instead of our supposedly shiny gigabit fiber. Now the gears are turning. If we did have a loop in the switch topology, then it was entirely possible that one direction between the houses would use the fiber while the other direction would use the copper. I don’t know much about how these cheap switches figure out things like that. We tried unplugging the fiber, forcing all traffic onto copper, but the netperf results were much worse. ping seemed to work, and ping -c 1000 gave fairly good results, but ping -c 1500 had a lot of trouble. That would explain why, generally, ping and ssh seemed to work but netperf gave bad results.
We unplugged the copper and plugged the fiber back in, and after a few seconds, the asymmetrical performance resumed. I’ve placed an order for another media converter, and we’ll see if that fixes it. At least they now cost half as much as when we got the first pair!
So, there was a lot going on here.
The hdhomerun was plugged into a gigabit switch, and working poorly. Changing to fast Ethernet fixed that.
The topology loop was routing off-site traffic over a poor copper link, but it was working well enough that we didn’t notice.
The media converter is probably bad, working well in one direction but not the other, and probably that explains the poor video quality .
And Erica gets to watch 1 Direction.
How are just plain folks supposed to figure this stuff out?
UPDATE
The new media converter arrived… and didn’t fix the problem. Well we have a spare now! The actual problem was a bad 8-port switch in Win’s basement, which we belatedly figured out once ruling out the fiber. We could have tested the link standalone by plugging computers into both ends, but we did’t think of it. Does gigE need crossover cables to do that? Or is the magic echo cancellation make crossover cables unneccesary?
A Debugging Story
I’ve been working on fos at MIT CSAIL in recent months. fos is a factored operating system, in which the parts of the OS communicate by sending messages to each other, rather than by communicating by shared memory with locks and traps and so forth. The idea of fos is to make an OS for manycore chips that is more scalable than existing systems. It also permits system services to be elastic – to grow and shrink with demand, and it permits the OS to span more than one box, if you want.
The fos messaging system has several implementations. When you haven’t sent a message to a particular remote mailbox, you send it to the microkernel, which delivers it. If you keep on sending messages to the same place, then the system creates a shared page between the source and destination address spaces and messages can flow in user mode, which is faster. Messages that cross machine boundaries are handled by TCP/IP between proxy servers on each end.
I’ve been making the messaging system a bit more object oriented, so that in particular you can have multiple implementations of the user space shared message message transport, with different properties.After I got this to pass the regression tests, I checked it in and went on to other stuff.
Charles Gruenwald, one of the grad students, started using my code in the network stack, as part of a project to eliminate multiple copies of messages. (I added iovec support, which makes it easier to prepend headers to messages), and his tests were hanging. Charles was kind enough to give me a repeatable test case, so I was able to find two bugs. (And yes, I need to fix the regression tests so that they would have found these!)
Fine.
Next, Chris Johnson, another one of the grad students, picked up changes from Charles (and me) and his test program for memcached started to break.
All the above is just the setup. Chris and I spent about two days tracking this down…
Memcached is a multithreaded application that listens for IP connections, stores data, and gives it back later. It is used by some large scale websites like facebook.com to cache results that would be expensive to recompute.
When a client sends a data object to memcached for storage, memcached replies on the TCP connection with “STOREDrn”. On occasion, this 8 character message would get back to Chris’s client as “