Verizon FIOS Static IP routing outage

Update: 12/30, 10 AM – problem appears fixed. Will call to find out what it was.
The backstory:
So on 12/22, Win noticed that email to Cathy wasn’t being delivered.  She’s using an IMAP server here at Serissa Galactic HQ, and our mail gateway, hosted on a virtual machine at Rackspace, normally delivers her mail to the IMAP server.
By two days ago, we figured out that in fact we can’t establish TCP connections between the mail gateway and systems at Serissa that happen to use a particular one of our 5 static IP addresses.  The others work fine.
This is just weird, but the VZ supplied Actiontec MI424 router is, well, just weird . . . but the problem isn’t the router.  After several hours of trying to configure various port forwarding and static NAT setups in the router, I called Verizon tech support.  After about 2 hours of phone hell, I got through to a fellow who was, well, clueful. It turns out you can set up screen sharing with them, and jointly click around in the router configuration screens.  The support rep eventually agreed with me that the problem existed, but at midnight December 23, there was not much help to be had from Actiontec.  He suggested connecting our system upstream of the router with a a switch, or using a different router if we had one.
I did not know that is now Verizon FIOS static IP works, but it makes a lot of sense.  There is an ethernet between the optical network terminal (ONT) and the Verizon supplied router, but you don’t have to use their router.  I unplugged the router and plugged in my macbook.  I said to Win “OK, I’m on the Internet… wait.  I am ON the internet!” I have actually never been before directly connected, not through a firewall, since Arpanet days.  Cool.
We have five IP addresses, and with the macbook running tcpdump, it was easy to see what wasn’t happening. With the macbook configured with our .10 address, we would attempt to open a TCP connection to our cloud system, but never got any replies.  Attempts to open connections from the cloud end never showed up.  By running netstat on the cloud end, we could see connections in “SYN_RCVD” state, but not getting ESTABLISHED.  Packets were going out, but not coming in.  Incidently, and strangely, ping, traceroute, and other ICMP stuff worked fine.
By changing the macbook IP address from .10 to .11 (another of our static IPs), it worked fine.
This was enough evidence to open a trouble ticket at Verizon.  We were told that they would get back to us in 24 hours…NOT.
In the meantime.  We changed our IMAP server to static NAT on a working IP address, and changed the port forwarding for inbound SMTP to match.  Now future email could be delivered, but 400 odd messages were stuck in the queue.  Win figured out how to add new Postfix rules to rerun the queue and translate the address, and we cleared the backlog.
Win also noticed that we can’t talk to www.dropbox.com, which may be hosted by Rackspace as well. The IP address is on a different class B, but it isn’t much different.  Same symptom.  We can’t talk to dropbox via our .10 address, but we can via .11 or other.
Christmas evening, after about 48 hours of silence from Verizon, I tried to get the trouble ticket status.  This is quite difficult. There is evidently no online way to do it, you have to go through phone hell.  After a few tries, I again got a skillful and helpful tech.    He told me that the ticket was assigned to the network techs, but there were no comments indicating anyone was working on it.  However, he searched around and found an outage report saying, roughly, that Massachusetts business fios static IP customers can’t talk to certain websites, and this outage report now had 75 trouble tickets linked to it.  He said he couldn’t tell me about other customers, but did mention trouble contacting www.experian.com, so I tried it.  We can’t talk to www.experian.com from .10 but we can from .11.
Our trouble ticket is now number 76, but there is no clue about who or when anyone might work on the problem.  Evidently other folks are much worse off than us, with their credit card processing machines unable to talk to the processors.
I will call back tomorrow or Monday to see what is going on.
I find this fascinating, but now fairly stress-free since Cathy’s email has been delivered.  What could cause reliable lossage of TCP connection setup, between stable, but seemingly random addresses?  Works fine for ICMP, but fails every time for SYN packets.  fios-10.serissa.com fails, but fios-11.serissa.com works. www.experian.com fails, but www.google.com works.  Maybe a corrupted hash table somewhere?  It seems like a very subtle and mysterious kind of thing.
Oh.  This blog is hosted by our cloud system, so I can’t talk to it via FIOS.  I’ve changed my laptop’s default route to use Win’s Comcast DSL instead, which works fine.  More proof that having a gigabit fiber between our houses is just a good idea.
One of the many problems with the Internet is that most people are at the mercy of their ISP.  The ISP controls the last mile and you have no real alternative.  Serissa happens to have both FIOS and Comcast links, but that isn’t as useful as you might think.  Inbound traffic knows about one or the other, and failover is manual and tedious.  I think we need an ASN so we can just let BGP deal with this, but that solution doesn’t scale well.
Update 12/26/2010 9 PM
We’ve found that our other IP addresses also don’t work … to different sets of sites.  For example, .11 can’t reach www.patternreview.com.
I called Verizon at 888 244 8880 to report this and to find out ticket status.  I was on hold for 35 minutes and reached a fairly clueless agent this time.  He couldn’t get any information out of the network technician group, which probably means that no one is working on the problem.  He was able to pull up the group outage report RIEH032H87.
I asked why I couldn’t get online status, and he says because my trouble ticket is linked to a group ticket, I can’t see status anymore.  That seems unlikely.
I’ve created a #fios hashtag on Twitter, just for fun.
Update Monday 12/27/2010 11 PM
I called Verizon again to find out if there is any progress.  Evidently the problem has been passed up from the network technicians to IP Engineering, and the NOC.  This seems good.  However, according to the rep I talked to, they are looking into a theory that traceroutes along affected paths are showing the trouble outside the Verizon network.
That doesn’t match what I see.  As an example, from our .10 IP address, we cannot reach www.stewart.org (this blog).  However, traceroute works.  From our .11 IP address, we cannot reach www.patternreview.com (never mind), but traceroute works.  From .10, patternreview works fine, and from .11, stewart.org works fine.
Here’s (part of) the trace for .11 to patternreview.com

4  so-7-2-0-0.bos-bb-rtr2.verizon-gni.net (130.81.29.174)  12.707 ms  4.781 ms  4.873 ms
5  ge-1-2-0-0.ny325-bb-rtr2.verizon-gni.net (130.81.17.24)  14.932 ms  13.907 ms  13.301 ms
6  0.ae4.br3.nyc4.alter.net (152.63.16.185)  23.043 ms  12.119 ms  12.602 ms

Here’s part of the trace for .10 to www.stewart.org
4  so-7-2-0-0.BOS-BB-RTR2.verizon-gni.net (130.81.29.174)  9.101 ms  9.121 ms  9.028 ms
5  0.so-0-2-0.XL4.BOS4.ALTER.NET (152.63.16.141)  18.682 ms  18.757 ms  21.011 ms
6  0.xe-4-1-0.XL4.NYC4.ALTER.NET (152.63.3.102)  21.096 ms  19.589 ms  19.308
The only common elements there are verizon (and the fact that the paths both go into Alternet.
Both traceroutes work all the way to the destinations, it is just TCP SYN/ACK packets that don’t come back.
I’ve heard a theory that someone is blacklisting fios addresses.  Until yesterday, we never used .11 for outbound connections, so I am skeptical.
In other news, we got about 14 inches of snow here. The kids are happy.
Update Tuesday 12/28/2010
Today’s wait on 888-244-8880 was 28 minutes.  Verizon needs better music on hold.
The representative today said the problem affects 71.x.x.x addresses (true) because when the 71 addresses were assigned to Verizon, website admins are notified to unblock them, but sometimes they don’t.
This is a fairly pathetic claim.  We’ve had the addresses for 5 years, they worked fine until a week ago, I control a machine I can’t talk to from one of my addresses, and ICMP traffic works fine, just not TCP.
It sounds like Verizon still has a theory about websites blacklisting Verizon addresses.  I think it is much more likely that some fancy router in the broken paths has a bad memory module,  My guess about which one it is based on the rather small differences between traceroutes of working paths and non-working paths. All of the non-working paths I know about pass from Verizon to Alternet in New York, for example, before branching off into other networks.    Try rebooting
6  0.ae1.BR2.NYC4.ALTER.NET (152.63.18.37)  26.290 ms  24.964 ms  24.691 ms
and see if that helps…
Update 12/29 at 11 PM
I called Verizon again.  As expected, there was a 35 minute wait on hold, and the representative said “they are still working on it”.  I asked for a supervisor and got very little more.  There are now 120 tickets linked to the group outage (up from 57), but there have been no comments added to the log since 12/27.   I suggested that certainly gave me the impression Verizon didn’t take the problem very seriously.

Cloud Deduplicaton

Here’s a thought experiment.
Consider the problem of online storage of music libraries.  There are various free sites that do this, and Apple is rumored to be planning cloud storage for users’ iTunes libraries, making it possible for a user to stream his or her music from anywhere.
It is fairly obvious that there is no need to store every copy of a song separately.  In the enterprise market, the idea of storing only one copy of each file, and then keeping track of who has a copy is called “deduplication”.
As an aside, this can work at the block level rather than the file level, and it can work even when common blocks of data in files are misaligned, due to some fairly cool technology called rolling hashes.
Now the upload phase of storing music into the cloud cannot work by just artist and title, because there may be many performances of a work that were separately recorded and are distinct.  Classical music fans are especially devoted to particular recordings of their favorites.  Consequently, the upload is likely to be accomplished by sending the hash of your local file, and the cloud server immediately says “yup, got that one!”  The user says, “Man, that upload was fast!”
Now it seems entirely possible to game the system, by sharing, not music, but <hashes of music>.  I go to the music hash warez site, and grab a set of hashes, and then I use my slightly hacked music uploader to say “Here are the hashes of the music I want to upload” and the cloud server says “yup, got those!”
Then, later, I can stream all the music, or reload my local copy after my local library is “accidently lost”.
So is it illegal to share hashes of music? Is the hash copyrighted?  Is this a bug in cloud deduplication? It would be a shame to require users to upload all the bits, just so the hashes can be computed in a trusted environment.
One possible solution is to compute keyed hashes. For example, the cloud server says “compute your music hashes using this personally customized algorithm”.  Of course, then the user can merely forward the instructions to a friend who <does> have the music.  Is it a crime to compute a hash function for someone?
-L

Order of Operations – Evil and Pernicious

Back in November, my son came home with a 6th grade math test in which he lost a point because he put in parenthesis that were not strictly necessary, according to the order of operations.
Here’s the note I sent to the math teacher:
I’ve been meaning to write about this, but not getting around to it.  I am moved to write because on Alex’ recent math test, he lost a point because he put in parenthesis that were not necessary due to the order of operations.
I’m not going to argue about the grading, which is fine given the syllabus, but rather I want to express my view that teaching order of operations at all is evil and pernicious.
The only correct way to handle math is to always put in all the parenthesis.  Here’s why.
In 6th grade math, the order of operations is pretty simple, multiply and divide are “stronger” than addition and subtraction.  Once you get to the rest of mathematics, and then to programming languages, the situation becomes impossible.
I hate to cite wikipedia, but this article is relevant.
http://en.wikipedia.org/wiki/Order_of_operations
Just look at the page and its examples and just the visual impression of vast complexity is there.
It is beyond dangerous to teach these things <and expect folks to remember them>. They won’t remember the details, but <will think they know>.  Smart, capable engineers will write expressions, thinking they understand what they mean, and <they will be wrong>.
A few years ago I was working at SiCortex, and we built a custom chip with about 150 million transistors, as part of a supercomputer.  The logic is expressed in the VHDL programming language, which like many, has a defined order of operations.  An engineer did something quite innocuous, confusing the order of operations of logical OR and bitwise AND, and in consequence the mathematics expression meant something quite different than intended.  This was caught quite by accident, but had it gone through, the cost would have been a half million dollar replacement chip mask and about 3 months of schedule.
I very strongly feel that order of operations is a quaint dated idea that we really need to stop teaching and stop depending on.  If you always specify exactly what you mean by grouping operations with parenthesis, you and the computer will always agree about what the math means.
This also means that putting in the parenthesis, even if not needed, is a good idea, it makes the meaning of the expression clear without any risk.  This sort of care should be applauded, not penalized!
Some programming languages, like LISP, get this right – they don’t allow chained operations at all, and have no need for order of operations.  Of course they don’t even use infix operators.  In LISP, one says (+ 3 4) or (+ (* 2 4) (* 5 6)) and there is never any confusion about it.
-Larry
PS  Don’t get me started about mean, median, and mode.  After 4th grade, has anyone actually used Mode?

Type conversion run wild

Many languages have the idea that if you assign a value of one type to a variable of another type, then the value will be converted to the same type as the variable.  So in C, for example
float x = 2;
converts the integer “2” to a floating point “2.0” before assignment.
So far so good.
Today I received an email with the following header field:
From: java.lang.NullPointerException@248257-web11.element115.net
This is just outstanding! My best idea of how this happened is that a function intended to return a value of type email-address instead threw an exception, which was faithfully type-converted to an email address.
I will send a reply, just to see what happens.

Why Peer to Peer IM is Good

Last Friday our FIOS Router died.
Actually it started to die two weeks ago, but I didn’t recognize the problem.  At first, my Macbook started failing to connect with the WiFi.  I was puzzled, but just switched to another access point.  Friday, however, the router started dropping packets on the LAN.  My daughter, for whom latency is a matter of WOW life and death, got on Bonjour to complain about the service to my neighbor Win.  (We have fiber in a conduit between the houses for Serissa Research).  Win in turn IMed me over Bonjour.  We found we couldn’t even connect to the FIOS router web server.
Down to the engine room…er.. basement.  I turned off the router and turned it on again, and it didn’t even light up.  Then I discovered the wall wart power adapter was super hot.  Never one to ignore a clue, I said “Aha!” to myself.  5 Volts DC at 3 Amps.  I’m an engineer, how hard could it be?  After rummaging through various boxes, I found a 5 Volt at 2 Amps supply from some DLink thing with the right connector, and we’re up and running again, for the moment.
Verizon says they will send a new power supply.
The moral?  Peer to peer IM like Bonjour lets the kids complain about the internet service without having to walk downstairs.  Those old-technology centralized things like AIM and Jabber only work when the Internet works.
This issue is also why I am leery of custom home NAS boxes like Drobo, for all their good properties.  I want something I can fix using junk PC parts late at night on a weekend, not something that requires a week turnaround time.  I am not religious about this, and you will pry the Time Capsule from my cold fingers.

The Trouble with Multicore

David Patterson has written a nice article about the advent of multicore processors.

See http://spectrum.ieee.org/computing/software/the-trouble-with-multicore Patterson is right that multicore systems are hard to program, but that isn’t the biggest problem with multicore processor chips.  The real problem is architectural – their memory bandwidth doesn’t scale with the number of cores.
I’ve been programming multiprocessor systems since 1985 or so.  At the Digital Systems Research Center we built a series of multiprocessor workstations with up to 11 VAX cores. Later I worked at SiCortex where we built machines with up to 5832 cores.
By the way.  I know that people say “core” when they mean “processor” and they say “processor” when they mean “chip”.  I find this confusing.  My answer is to use “chip” and “core” and avoid the overloaded “processor”.
At Digital, we thought multiple threads in a shared memory were the right way to code parallel programs.  We were wrong.  Threads and shared memory are a seductive and evil idea.  The problem is that it is nearly impossible to write correct programs.  There are people who can, like Leslie Lamport, Butler Lampson, and Maurice Herlihy, but they can only explain what they do to people who are almost as smart as they.  This leaves the rest of us on the other side of a large chasm.  Worse, some of us <think> we understand it and the result is a large pool of programs that work by luck, typically only on the platform that the programmer happens to have on their desk.
Threads and locks are a failed model.  Their proponents have has 25 years to explain it, and it is too hard.  Let us try something else.
The something else is distributed memory – a cluster.  Lots of cores, each with its own memory, connected by a fast network.  This model, in the last 15 years, has been spectacularly successful in the High Performance Computing community.  Ordinary scientists and engineers manage to write useful parallel programs using programming models like MPI and OpenMP without necessarily being wizard programmers, although that does help. Distributed memory parallel programs tend to be easier to write and easier to get right than the complicated mess of threads and locks from the SMP (symmetric multiprocessing) community.
The other huge advantage of clusters over shared memory machines is that the model scales without heroics.  The memory bandwidth and memory capacity scale with the number of cores. It is possible to build fast low-latency interconnect fabrics that scale fairly well. Clusters are <also> a good match for programs in another way.  Patterson cites the sorry history of failed parallel processor companies, but he didn’t mention that every one of them had a unique and idiosyncratic architecture, for which programs had to be rewritten.  The lifetime of a successful application is 10 or 15 years. It cannot be rewritten for every new computer architecture, or every new company that comes along.  The processing model for clusters has not required so much rewriting.  A cluster that runs Unix or Linux, and supports C, Fortran, and MPI, can run the existing applications.
So my modest suggestion to Intel is to not bother with larger SMP multicore chips. No one knows how to program them.  Instead, give us larger and larger clusters on a chip, with good communications off-chip so we can build super-clusters.  Don’t bother with shared
memory, it is hard anyway. Give us distributed memory that scales with the number of cores. I too am waiting for breakthroughs in programming models that will let the rest of us more easily write parallel programs, but we already have a model that works just fine for systems up to around 1000 cores.  No need to rewrite the world.
Side notes:
Someone is going to point out how wonderful shared memory is, because you can communicate by simply passing a reference.  Um.  The underlying hardware is going to copy the data <anyway> to move it to the cache of a different core.  If you just do the copy, you are not really doing much extra work, and you get the performance benefits of not having to lock everything.
Yes, I dislike MPI.  Its chief benefit is that it actually works really well, at least as long as you stick to a reasonable subser.  I really like SHMEM, and would prefer even a subset of that.

Redbox Bad User Interface

We got a promotion code for a Redbox movie rental somewhere, and I went to use it.
Evidently, the machine was not on its home screen when I got there, but on the search screen instead.  Consequently, the “rent with a promotion code” button was not visible.  I followed the only path available, which was swiping my credit card.  I expected a “promotion code” box on the checkout screen, but there wasn’t one.  Instead I got the dvd out of the slot.  Alice in Wonderland, if you want to know. <After this> the home screen came up, showing the “rent with promo code” option, but it was too late.
Well this is bad, I thought.  My first instinct was to just put the disk back in, but I decided to call customer service first.  They said you have to be on the home screen and I would be charged, but on the bright side, my promo code was still good.
This sort of thing enrages me, and I didn’t want to waste any more time on them, but Cathy said to call back and ask for a supervisor.
The new agent, (Hi Jessica!), gave me the same story, but kindly went off to fetch a supervisor.  I never got to talk to him or her, but they say they are fixing it for me, and will refund the dollar when I return the disk, and have cancelled the code.
Good – Redbox seems to be doing the right thing here, but it cost me a lot of time calling them, and it is costing them two expensive customer service calls.
Bad – The UI design seems wrong.  It shouldn’t be modal, with the customer having to choose between pay or promo before starting!  In my case, I never even saw the choice because when I got to the machine it was already on a search screen down the “pay” path.  It is possible some previous person left it that way.  I don’t know if there is a timeout back to the home screen, but adding one is the wrong fix!  A better design is to have the promo code screen as part of the checkout flow, where the “what is your email” already is.  Doesn’t even add a screen that way.
I’ll let you know if the credit doesn’t appear.  Netflix is starting to look better already, although we don’t watch many movies.

Texting

There is a gap in function between mobile phones and landlines.  Mobiles can send and receive text messages and landlines cannot.  This should be fixed.
Idea 1 – text messages for landlines
The network already has technology to send caller ID. The same signalling could be used to transmit text messages.  Caller ID works by sending an ascii character string to the phone between the first two rings. The signalling is at 1200 bps, and could last for three seconds, giving room for up to 450 characters.  Since texts are shorter than that, the central office could repeat the message for reliability, or we could add technology to let the phone acknowlege the signal.  The upstream ack could be modem tones or touch tones, which the phone can already generate.
Since current phones generally do not have good displays, and don’t expect text messages, this capability could be added to answering machines instead of phones.
* The answering machine could record the text for later display
* The answering machine could send a tone or modem acknowlegement
* In principle, the answering machine could have cell-phone like software to generate texts using T9 predictive keyboarding, or use a full keyboard.
* An answering machine with this capability could answer calls with a distinctive beep or tone sequence that inform the CO and equipment at the caller end that the capability to receive texts was present.
I know this is starting to sound like minitel!
Idea 2 – Texting <during> calls
I frequently call someone to get a phone number.  On other occasions, folks call me to get a number. We do this in speech.  The recipient has to write down the number or try to remember it.  When the caller is mobile, this adds to the danger of using the phone while driving.
Why not make it easy to send numbers during the call?
* If you press the keypad during the call, the phone will send touch tones.
* The receiving phone should detect this, and greatly attenuate the earphone path, so the tones don’t blast the ear, and should remember the digit sequence.  After the call, the phone could make those numbers available to dial.
* On the iPhone, and probably others, in the contact list there is a “share contact” button. This can send a contact via text.  If you try to do this during a call, the phone could send the number in-band as above.
* For landline users, an answering machine (as above) could listen in parallel to an arriving call and record such in-band messages.
Rich phone companies, send your licensing inquiries to me!

iPad and iPhone – open for a fee

There is a lot of grumbling out there about how the iPad and the iPhone are closed environments, and you can only run programs that are blessed by Apple, and available through the App Store.
It occurs to me that one is perfectly free to write software for your own use, simply by joining the Apple developer program.  You can download the SDK and write code anytime.  If you want to test the code on a real device, then you need to join the paid developer program, which currently costs $99/year.
As long as you don’t use pre-release features of the iPhone OS, you may even be able to share your source code with others.  There are lots of sites out there with code examples.
So perhaps the iPhone and iPad are not closed at all, you can run Apple approved software for no extra charge, but you can run anything you write for $99/year.

Chuck Thacker wins ACM Turing Award

This week Chuck Thacker has won the ACM Turing Award. This is good.
The best article I’ve seen is this one from Microsoft:
Microsoft Press Release on Chuck Thacker
I had the privilege of working near Chuck at Xerox and for him at Digital, back in the day.
I started at Xerox as a grad student intern in 1977, working for Ted Strollo in the Systems Sciences Lab (home of Smalltalk, Alan Kay, Chuck Geshke, and John Warnock).  My first project was a power line carrier communications modem.  I got to <use> an Alto, which was by itself a transforming experience.  Technically this project wasn’t that interesting, but it came out well enough that I was able to talk the lab into letting me stick around.
My next project was with John Shoch and the DARPA Bay Area Packet Radio network, which was a network of packet radios around the San Francisco Bay Area, at 100 to 400 Kbps.  This was in 1978, mind you, a few years before WiFi.  PARCs part of the project was to provide packet switching experience and my part of the project was to design the hardware to interface the Alto to the packet radios.
Rudyard Kipling said “An engineer can do for 10 cents what any fool can do for a dollar.”  Chuck Thacker, the engineer’s engineer, could do for 10 cents what a mortal engineer couldn’t do at all.  With the Alto, in the mid ’70s, that meant building a six MIPS minicomputer, 128 Kbytes of memory, 5 MB disk, and million pixel display, for $20,000 or so.  I got to know the innards of the machine fairly well, designing the BBN-1822 interface for the packet radio, and writing the microcode and device driver for it.  The Alto had extreme economy of design.  The CPU executed a 32 bit microinstruction every 170 nanoseconds, and “hyper-threaded” between 16 micro tasks.  The lowest priority task ran an emulator for whatever high level instruction set you wanted: Nova-like for BCPL, bytecodes for Mesa, and different bytecodes for Smalltalk. The other micro tasks were responsible for the Ethernet, the disk, the display, and whatever else got plugged in, like a laser printer controller, or a packet radio.
This capability let you design I/O device controllers that were much simpler than they had any right to be. The 1822 interface turned into a couple of shift registers and  a couple of PROM-based state machines, plus a modest handful of microinstructions. (Dave Boggs of Ethernet fame taught me how to build the state machines.)
I was pretty young then, and I didn’t immediately realize how amazing this was. I had designed things like a tape drive interface for an Interdata and a color display controller, that took up entire boards, but this stuff was <tiny>.  The whole Alto was like that.  The disk controller was essentially the same, all datapath and no control.  The disk microcode would wake up once a sector and ask itself “is this the right place to start transferring data?”
Alto: A personal computer
In 1981 I graduated, and landed a full time job in the PARC Computer Science Lab.  My project was building the Etherphone, and by then Chuck was busy building the Dorado, an ECL based personal super-mini.  I started slowly picking up Chuck’s design ethic:
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away” – Antoine de Saint-Exupery
In 1984 I followed Chuck and Bob Taylor to the Systems Research Center of Digital Equipment Corporation.  We had 24×80 dumb terminals hooked up to a VAX-785 time sharing system.  This was not the same thing as a personal Dorado at all.  The first main project was the Firefly (I think the name suggestion was mine, actually), a multiprocessor workstation built out of commodity processors.  The first version used the Motorola 68010. Chuck designed the hardware.  I wrote the boot rom code.  Around then, however, Digital came out with the MicroVAX chip, and we immediately started a redesign. The Firefly used a coherent memory system we called the “snoopy cache”, where each processor “snooped” on the bus traffic of the others to maintain a consistent view of memory.  This scheme and its variations became the standard way to build small scale multiprocessors.  I designed the MicroVAX CPU modules, Chuck designed the memory system and the display controller.  The display controller was another minimalist creation – replacing the “standard” Digital display controller with one that did more and took half the space.  He also threw in audio I/O, with, I think, two extra chips and some microcode.  A typical Thacker design element for the CPU modules was his choice of a two-phase non-overlapping clock system. This let us use Earle latches implemented in 15 nanosecond 16L8 PALs for all the control logic, without needing any edge triggered registers or causing much heartache about timing.
Firefly: a multiprocessor workstation
After the Firefly, Chuck turned to networking, building, in 1987 or so, a 100 Mbps local area network called Autonet. I didn’t work on Autonet much beyond design discussions, but I came away as coinventor with Chuck on a routing patent.  How cool is that?
Chuck’s next big idea, around 1988 or so, was to build a liquid cooled minimalist 200 MHz computer in a single ECL gate array.  Bill Hamburgen of the Digital Western Research Lab knew how to do the liquid immersion cooling. Phil Petit of SRC worked on the CPU design.  My piece was the level-1 cache modules, designed using 1K bit Gallium Arsenide SRAMs.  This was a lot of fun.  We never built it, because the project was overtaken by Alpha.
Digital’s Alpha chip was a technical tour de force.  Chuck’s idea was to build multiprocessor development systems around the chip, to speed Digital’s time to market, and just maybe, to encourage a bit more minimalism among the Digital engineering community.  At that time, the spec for Digital’s “BI” bus design for multiprocessors ran to some hundreds of pages.  Chuck’s design for the coherent memory bus for the Alpha Demonstration Unit was 13 pages.
The Alpha demonstration unit: a high-performance multiprocessor
I designed the I/O system for the ADU. It was built with ECL100K, power dissipation no object, but very fast, and very clean signals. Chuck designed the memory system, and Dave Conroy designed the CPU module.  Dave also embodies Chuck’s minimalist spirit.  he kept a copy of the classic 5 tube AM receiver circuit on the wall of his cube, with the caption “If you want a job here, remove a part from this design”.
All American Five radio design
Wikipedia Article on All American Five
The ADU project probably saved a year time to market for Alpha products, and accelerated around a billion dollars in revenue.
As others have noted, while Chuck is primarily a hardware designer “a humble purveyor of cycles,” he’s also an architect and programmer.  One time at Digital he got an early laptop, with TurboPascal, and immediately started writing cad tools for himself.
I’ve gone off in different ways in the last 15 years, but I was able to visit Chuck in his lab at Microsoft last spring.  I have to say not much has changed, he was busy designing logic for an Ethernet controller, only this one runs at a gigabit and fits in a corner of the BEE3 FPGA system.
I am very pleased at the ACM’s recognition of Chuck’s contributions. Now go back to your offices and delete some logic or code that doesn’t really add anything. You will be one step closer to perfection.
-Larry