The Third Option

There seem to be two separate theories or good software design that everyone seems to espouse:

1. Write good software to begin with, then hope it doesn’t fail. This is the putting all your eggs in one basket theory, after making sure you’ve built a really good basket.

2. Write software, assume that it will fail, but make sure there is some redundancy in the system. Ensure it restarts when it fails, make sure there are two machines to handle the job, etc.

There are three assumptions that I hate about these all-too-common paradigms.

1. This assumes that you can write software that doesn’t have any faults. This is demonstrably false. This theory is often espoused by younger programmers and the ideologues in your group. Beware.

2. This assumes that you can’t write software. Nothing wrong with this, given number one above. However, this leads directly to the conclusion that any old crap will work and that testing, quality don’t matter. This is often espoused by older programmers and managers under time pressure.

3. Oh — the third assumption that I hate? The fact that these two options are the only way.

Me I prefer a third paradigm: Combine one and two. Make sure you write the best stuff AND provide a backup. Duh. Let me tell you how this wasted nearly my entire weekend:

I work in a small software shop. We have a part time MIS guy who works full time programming for us. He’s pretty good. Unfortunately, he’s out of town at a tennis tournament in Vail. That means that I am the designated sufferer this weekend.

We’ve almost divested ourselves of all of all our Windows servers. We still have one, required for our CRM system, that runs MS SQL server. From this server, we serve data to lots of places: inventory management, sales reports, etc. We do this from other Linux servers, running Perl and PHP scripts accessing the SQL server through a ODBC connector, running on the NT box. This works well, except when it doesn’t.

This ODBC Connector (hereafter referred to as That Piece of Shit or TPoS) runs as a service on the NT box. Here is where the design theories come in. Both the developer of the service and Microsoft fell prey to design fallacy number one above. That nothing would go wrong. TPoS developer thought that their little software would run forever, and provided no way of monitoring it or easily shutting it down remotely. Unfortunately, Microsoft also assumed that the software running as services on their machines would run forever. And they too provided no easily accessible way of restarting or maintaining things remotely. This means that when something goes wrong – someone has to go sit at the console and make changes. Restart TPoS. This wasted my weekend, as this happened (for no obvious reasons) three times this weekend.

IMHO, this is one of the number one reasons that MS servers are unsuitable for use in business. They require too much jiggery-pokery to run well. Yes – their are solutions to the above problems: running VNC (over a secure VPN – another thing to set up and maintain!), some remote access tools (buy them as extras!) These solutions suck just as much as trying to get my Mom to run Linux on her desktop. This leads to lesson number two for the weekend: Use the right tools for the job.

If Linux worked on the desktop, then it wouldn’t need so much hacking and *censored*ing around as it does to play well with others. It’s getting better, but it isn’t there yet. Don’t cry to me. But for this same reason, Windows isn’t ready for the server room. Just too much hacking to get it to work. Use the right tool for the job.

Of course, I’m saying this as I type in a Starbucks on the third option – OS X. Sosumi.