Link Bonding Craziness in RHEL/Centos 5

I just went through hell in a handbasket trying to get 802.3ad Link Aggregation set up on a Centos 5.2 Xen box.  Setting up link aggregation itself isn’t that bad – http://wiki.centos.org/TipsAndTricks/BondingInterfaces for a simple guide (after your managed switch is configd) – but what ever I did, I was unable to get both interfaces simultaneously active.

About the only useful debugging info I got was that the MAC was in use.  I was puzzled because as far as I know, link agg takes over the primary MAC and sets that up for both NICs.  Furthermore, the same exact hardware was working great on Fedora 10.

bonding: bond0: Warning: the permanent HWaddr of eth0
 - [MAC ADDR]- is still in use by bond0. Set the HWaddr of eth0
to a different address to avoid conflicts.
bonding: bond0: releasing active interface eth0
bonding: bond0: making interface eth1 the new active one.
bonding: bond0: Removing slave eth1
bonding: bond0: releasing active interface eth1
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave eth0.

Luckily, I stumbled across this bug report.  If you scroll down to the last comment, this appears to be a Xen specific issue.  By default Xen tries to set its bridge up on eth0, and I assume this prevents the kernel bonding driver from taking over the NIC.  By opening up /etc/xen/xend-config.sxp and adding:

(network-script 'network-bridge netdev=bond0')

Xen will bridge to the bond0 interface, and everything will work as expected.

Another trick I had to do was add a start delay to the networking scripts.  This is useful if your hardware is crap (cough Broadcom), you need a dhcp lease and it fails, or you are running STP, link aggr., etc.  On Fedora, RHEL, and derivitives this is accomplished by adding the NETWORKDELAY directive to /etc/sysconfig/network:

NETWORKING=yes
NETWORKDELAY=31

If you need more granularity, you can set delays to specific adapters in the /etc/sysconfig/ifcfg-{x} files with the LINKDELAY directive.

Just a couple of hard lesssons from the trenches, hopefully this will save someone else some time.

Political correctness in open source doesn’t matter

See: http://www.itwire.com/content/view/22467/1090/.  To answer the sensationalist title, no.

I can’t believe people are trying to make this an issue.  I guess it was only a matter of time before the crap that we deal with in the rest of the world met up with open source.

I have seen several prominent developers on just the kernel that just happen to be women.  Many more on large projects like KDE.  Great, big deal.  They shouldn’t receive special privileges, recognition, or anything because of it.

“The strange thing about this episode is that it looks like the FOSS community seemingly doesn’t want to know about it.”
No shit.  This kind of crybaby attitude is why governments and large corporations can’t get anything done, too worried about offending people.  Most of us FOSS people are here for the goods, not to set up bureaucracy, politics, and political correctness.

This Gentoo developer is spot on: http://steveno.wordpress.com/2008/12/17/mad-gnu-women/.  The existence of women’s only groups like Debian women are wrong in the first place and harmful.  In the same category as Richard Stallman – well intentioned but counter-productive.

Isn’t the goal to write and use good software?  Gender has nothing to do with that.  Neither does race, color, or being the stereotypical guy that spends countless hours hacking away in the parent’s basement.  Yet we think nothing about laughing at the last.  The world would be a better place if people just grew thicker skin.

How to upgrade to ext4 in place

Here’s how you upgrade to ext4.  The process is pretty easy, but requires an fsck which means unmounting or rebooting if the file system is in use.

Make sure you are using at least e2fstools 1.41.3 and kernel 2.6.28 (or a vendor kernel with latest ext4 patches applied)!  Also, its probably a good idea to have proper backups (really!).  ext4 has just been declared stable, but what that really means is that the battle hardening has just begun.  I’ve done several heavily used systems without fault so far though, so its probably good enough for your desktop.

WARNING: DON’T CONVERT YOUR /boot PARTITION. Right now, there is no stable version of grub with ext3 support.  Even if there was, it really won’t gain you anything  :-) .

Run tune2fs, e.g.:

tune2fs -I 256 -O sparse_super,filetype,resize_inode,dir_index,ext_attr,has_journal,\
extents,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize /dev/sd[x][n]

Those are the default options for an ext4 file system if you were to create it with mkfs.ext4 (e2fsprogs 1.41.3 – see /etc/mke2fs.conf).  I’m getting pretty damn good performance with this!  The ‘-I 256′ option sets 256 bit inodes, which most recent ext3 FSs use already. If this is the case, and you get a message telling you so, remove this option.  Note that extents will make the FS backwards INCOMPATIBLE with ext3.

Next, edit /etc/fstab, e.g.:

/dev/vg/home /home ext4 defaults 0 0

Either unmount and mount or reboot your system.  tune2fs marks the fs as dirty and performs a fsck and conversion.
NOTICE: distros with initrds may need to be regenerated or you won’t be able to mount your root file system.  In Fedora (replace kernel version with your own):

cd /boot
mv initrd-2.6.27.7-134.fc10.i686.img initrd-2.6.27.7-134.fc10.i686.img.old
mkinitrd initrd-2.6.27.7-134.fc10.i686.img initrd-2.6.27.7-134.fc10.i686.

That’s all there is to it.  Stay tuned for future ext4 developments like online defragmentation.

Also, ext{2,3,4} reserve 5% of space for root in case the drive fills up.  On large modern drives, this can be excessive (e.g: 50GB on a 1TB disk).  Consider running ‘tune2fs -m 1 /dev/sd[x][n]‘ to reduce this to 1%.

For more information and tweaking:

  1. Documentation/filesystems/ext4.txt from the latest kernel sources
  2. http://ext4.wiki.kernel.org/index.php/Main_Page
  3. man tune2fs
  4. http://e2fsprogs.sourceforge.net/

Why alternative search engines are bound to fail

I’ve seen a lot of new bots crawl my site recently and it got me thinking about search engines.

Basically, search is a solved problem.  Of the alternatives I explored, most offered nothing exiting such as magazine-style page flipping, or thumbnail previews.  I hardly see how that would help me locate useful information on the web.

No, search these days isn’t about presentation, it’s about quality of results and how much of the web you index.  While #1 can always be improved, right now Google completely dominates at index size.  Additionally, both of these can be influenced primarily by money by attracting great thinkers – Google – and performing massive scale-out – Google again.  Google has the momentum to not only keep its current lead, but continue to refine and improve web search at a rate much greater than competitors.

The only companies(or will it be company?) that can really compete with cash and size are Yahoo and Microsoft.  Yet even both of these a dwarfed in comparison to Google[source].  So your new search startup has basically ZERO chance of success unless you are lucky enough to get swooped up by one of the three mentioned above.

Obviously I’m talking about web search here.  There will always be a place for niche search like local language, blogs, etc just like there will always be a place for niche products of any kind.

But I just can’t imagine who, especially in this economy, is funneling money into these other web search startups.  You might as well pour it down the drain.

NIN Echoplex live rehearsal

One of my favorite blogs, Create Digital Music, ran a piece on NIN doing a rehearsal for Echoplex.

It’s neat to see the Lemur, a touch screen MIDI controller, used live by the drummer.  Overall the song structure is relatively simple but it is amazing how well they keep together.  Notice the dude manning their monitoring setup full time and the in ear monitors 8-)?

I need to get a hold of an old touchscreen PC and see if I can convert it into a Linux MIDI controller with Open Sound Control and maybe some custom Java.  It’s sad because I had something perfect for the task a few years ago, an IBM Industrial panel PC, basically a laptop in a bulletproof magnesium case with touchscreen :-(.

If you are interested in this sort of thing, click through, otherwise enjoy the tune below.

http://createdigitalmusic.com/2008/12/17/how-they-work-nin-echoplex-rehearsing-live-with-lemur/

Also, check out Linux Journal’s Java Sound and Music three part series.


NIN: Echoplex – Live at Rehearsals, July 2008 from Nine Inch Nails on Vimeo.

Linux and the PC Mentaility

Here’s a minor thought dump of the moment: no hard numbers, just some general observations over the past year or two.

I’ve heard the comparison between Linux and other UNIX like OSes a few times in the past.  Of course it is a bit hard to compare “Linux” – which is a kernel – and the others which most often are complete operating systems.  Many times they even share a lot of GNU userland tools, so I will disregard userland.

The general feel I get from looking at some parts of Linux and its development process is that of the PC world.  Build cheap, throwaway solutions and rip and replace later.  Looking at things like the scheduler, FreeBSD was able to grow similarly advanced solution in a much more organized fashion without ripping and replacing this critical component several times and has far less developers.  *BSD has a more traditional UNIX process from the days of the minicomputer and professional workstation, while Linux is deeply rooted in PC culture.

Aside from the scheduler, there seem to be other places where development has tarnish before shine.  hrtimers, dyntick, CFS, and SLUB are some of the more controversial moves in the past year or so.  Indeed, kernels 2.6.23+ didn’t “feel” as polished as the few leading up to it and “felt” less responsive to me when playing audio or video concurrent with other tasks like compiling – something CFS was supposed to fix, but I never had a problem with before.  Going into 2.6.28, these big changes are starting to settle down.  Things once again “feel” better, more stable, and better performing – so I guess it is just a phase.

From a software engineering standpoint, it is interesting to ponder how to improve the process.  One thing Linux has that many other projects don’t is an enormous amount of contributors.  A lot of people and companies are trying to different things with the kernel.  It is important for them to get their code into the kernel as soon as possible, because it is such a fast moving process, but new features lead to bugs.  Linux-next appears to be helping with keeping in progress trees so that developers can explore ideas before pushing to mainline, and Adrian Bunk’s idea of the “long time support” kernels for those of us that don’t run large distro kernels seem like a step in the right direction.  I do wish he would forgo 2.6.27 and base it on 2.6.28 for the next LTS though.  2.6.27 has had quite a lot of major patching done since release and as I said earlier, 2.6.28 just looks better so far.

Another point: technically superior solutions don’t always get favored.  I’ve said before that I think there were better choices for file systems than ext3 for the past 8 or so years.  CFS vs. RDSL is another example.  The latest is SCST vs. SGST (http://lwn.net/Articles/311344/).  SCST is clearly the only solution for the enterprise where Fibre Channel and Infiniband are likely to be used.  And Ingo and Thomas are at it again reinventing the wheel with performance counters.

The counterargument is that Linux runs on most of the worlds supercomputers all the way down to small critical embedded systems.  Most of these are running obsolete kernels with heavy vendor backports though.

Running the latest kernel is fun; after all I am a Gentoo user so I don’t mind when things break, but for truly mission critical tasks I would not be afraid to look outside of Linux to BSD.  Ironically, Solaris, once the stalwart of enterprise, seems to be embracing the Linux mentality.

USB 3.0 on Linux

Take a look at this Intel developer’s blog:

http://sarah.thesharps.us/2008-12-07-13-35.cherry

The video shows a USB flash drive transfering video at 125MB/s!  To give you an idea of this speed, it is likely more than your hard disk can put out which is probably around 80MB/s.  The rest of the article gives a good overview of USB 3.0 and she states that the bus should have about 400MB/s bandwidth in the real world.  This is a breath of fresh air for external devices of all sorts, and I can’t wait to see this on new computers.

The Linux drivers are currently under development, with the subsystem patches going into review.  The xHCI driver will have to wait until Intel finalizes and releases the specs.  Anyways, it’s safe to say that Linux should have USB 3.0 support as soon as products hit the market in the middle of 2009.

Bulletproof your server to survive Digg/Slashdot

implementing scale up for web 2.0 sites with current practices

This blog was recently featured on Slashdot over the Thanksgiving holiday in the US.  It was the perfect storm: commercial news organizations were mostly dormant creating a slow news day, and geeks like me were at home eager to get the latest technology scoop.  What surprised me is how this relatively modest box, a Linode 540MB Xen Virtual Machine, withstood up to 100+ requests a second without even breaking a sweat.  Furthermore, I had only performed some of the tuning I detail below.  It scales to over 1100 requests a second after following my guide below!

I will detail how to tune your server for optimum capacity, or what I will call free scale up (as opposed to scale up by adding hardware or scale out – adding machines, database servers, application servers, load balancing – which may come in a future article depending on interest).  Most of the ideas here are platform neutral – both OS and application server – assuming you are using a UNIX style OS.

The only tutorials I’ve found were dated and don’t detail the latest practices like varnish or Passenger, so read on for a fresh look.

Audience

The intended audience for this article is anyone running a web site.  Running your own web server gives much greater flexibility in choice of development environment.  A dedicated server and certain virtual private server providers give much more predictable performance and wont cancel your service on a whim. (I’m looking at you MediaTemple… google for horror stories).  A Linode VDS is much more flexible and very powerful for around the same cost.

Web Server

Most people use Apache.  According to Netcraft, over 50% of hosts were as of November.  For good reason, Apache has proven stability, scalability, and security.  Some folks are quick to rip out Apache due to poor configuration and tuning.  I personally find it to be an excellent choice for most sites because of the aforementioned traits and first-rate extensions.  With proper setup, you will likely max your transfer or tax your application sever before it ever becomes the bottleneck.

Apache Tuning

The key to tuning Apache is to minimize RAM usage, especially on a limited machine like my 512MB Linode.  Memory swapping of applications to disk is almost entirely unacceptable on modern servers.  Disk I/O is very expensive and the biggest bottleneck on modern computers, which is why swap is so unappealing.

Therefore, you need to:

  1. Limit overall Apache memory usage
  2. Minimize per thread/process memory usage
  3. Minimize disk I/O
Limit overall memory usage

Step one is very important.  If your server begins swapping heavily, it can be very difficult to even log on and perform administration.  You need to develop an idea of the RAM an average Apache process is using via top, ps, or another monitoring framework.  Make sure you are looking at the RES column in top, since shared libraries will be used between all processes.  Take this number and divide it by the amount of availalbe RAM.  Available RAM should take into account RAM used by other processes including your database when under reciprocal load.  Set the MaxClients directive to a number close to the resultant, and tune accordingly with benchmarks (see Benchmarking section).

Minimize memory usage

Step two determines how many child processes you can handle.  This is important because the more children, the more in flight requests, and lower end user latency.  This is also a lot more environment dependent than step one.

A good way to reduce memory consumption is to unload unneeded modules.  Most server operating systems default Apache with a wide range of modules that are probably not used on your site including several basic authentication methods.  Using shared objects rather than static modules will help memory usage as well, and most distributions ship this way.

If you use an Apache module for your application server (mod_php, mod_perl, mod_python, Passenger aka mod_rails), each child process will consume the memory of that module regardless of whether or not it is serving a static asset (images, css, etc.) or an application page.  Mitigate this by using a proxy (see next section) or moving application serving to its own processes via FastCGI (PHP, most others), AJP (Java, Python), WSGI (newer Python), proxy (Ruby, all).

Disable logging

I should take a moment to step back and hit on an important topic.  Hard disks have improved very little in regard to performance in recent years.  Disk I/O is an expensive task and therefore the primary bottleneck you wish to avoid.

When Apache logging is enabled, a write operation must occur for every hit.  If possible, consider completely disabling access logging.  You can outsource web statistics to Google Analytics.  If you require logging, make sure HostnameLookups is disabled (network I/O is even more expensive than disk!) and batch look-ups on another machine or during idle periods with a log analyzer.  As your setups grows (scale-out), log files will become more cumbersome and you will probably be logging to database or a central server anyways.  Varnish, a proxy/http accelerator detailed below has an optimized design for logging.

mod_cache

Apache has an integrated cache module that will keep frequently hit static assets in memory.  For larger sites, forgo this and use a proxy which will be more flexible and allow easier scale-out.

MPMs

Apache makes use of MPMs, or Multi-Processing Modules, for its core functionality.  The default on UNIX is prefork, which makes a separate process for each request.  By switching to a threading MPM such as worker or event, you can cut down overhead and memory use.  Some modules do not play well with threading (PHP), so you should research before changing MPMs.  prefork works well for one and two core servers.

Alternative Web Servers

Lighttpd is the leading alternative FOSS web server.  Users include A-list web sites such as Youtube and wikipedia.  Benchmarks show impressive performance.  Keep in mind Apache is by no means slow nor resource intensive and links on that page show that it is faster on some workloads.

When making comparisons, keep in mind that by design you will probably be using a FastCGI application server and most of the optimizations above will hold true for Lighty.

For sites with long connection times (download servers, AJAX keep-alive) or static content servers, I would definitely lean toward it (scale-out).

Nginx has also been picking up steam (pun intended) and is being used by large sites like WordPress.com.  I would consider it in the same class as Lighty.

Reverse Proxying

A reverse proxy is very useful for modern web serving.  Even with just one server, a reverse proxy will keep common pages in memory – greatly reducing disk I/O.  They will also keep static requests from using potentially heavy application server HTTPd processes.  These are often very fast at basic HTTP since they are not concerned with all the features of a web server.  When it comes time to scale-out, the proxy can be moved to a separate server.  Proxies can direct traffic to different backend servers.  Proxies can even be placed in geographically disperse areas (think CDNs: Akamai, Limelight – Youtube, Google).  Logging, compression, and SSL can be offloaded to the proxy.  In short, you want a proxy even on a single server (or at least mod_cache).

Varnish

Varnish bills itself as an HTTP accelerator.  It was written from the ground up to perform reverse proxying, and this it does well.  The Varnish design philosophy is enlightened and leaves a lot of the work like memory management to modern advanced operating systems.  Logging is performed in a separate processes and is optimized.  If you need an advanced proxy and accelerator, this is likely the way to go.

Squid

Squid has traditionally been used as the de facto FOSS forward and reverse proxy.  Many large sites such as Wikipedia are extensive users.

Apache and Lighttpd

Both Apache and Lighttpd have modules that will allow them to cache and reverse proxy.  For single server setups, it would probably be worth reusing the components of your web server (think: shared memory) if your application server is external.  mod_proxy is very useful for forwarding ruby requests to a Ruby web server like mongrel or thin.

Application Server

The application server is where most of the magic happens in today’s web 2.0 sites.  Gone are the days of static HTML files.  Most sites are now dynamically generated every visit, and customized per visitor.  This is an order of magnitude more complex, and a lot of CPU time is spent on page generation.  Therefore, tuning here is often one of the best things you can do to improve site scalability.

PHP

PHP is the most widely deployed language on the web.  Many extremely popular applications are written in PHP, including: MediaWiki, WordPress, Drupal, and phpBB.

Opcode Cache

By default, PHP breaks a script down into opcodes every time it is called.  Opcode translation is necessary to simplify programs so they can easily be parsed by the Zend Engine.  It is unnecessary for this to be done every time a script is called since the source code will rarely change once deployed.  Luckily, a cache can be added that will eliminate this step.  The net performance gain can be a factor of 2 to 10, very impressive for a simple install!

These days, you should chose APC – The Alternative PHP Cache.  Once upon a time, there were several choices here. Turck MMCache was notably fast, beating even the commercial Zend Suite, but mysteriously died out (the original author is now a Zend employee. hmm.. coincidence??).  Others have tried to revive it in the form of eAccelerator, but it isn’t stable nor active.  Any other arguments are moot point since APC will be part of PHP6 core as well as having PHP’s founder as a developer.

Modules

Just as with Apache, removing unused extensions in PHP will help reduce memory usage.  These can be commented out in php.ini.

Rails

Rails has gained a lot of steam (okay I’m wearing that one out) and is a favorite among many Web 2.0 startups including Twitter.

A lot of Rails scalability problems are due to the underlying Ruby language.  The garbage collector, threading and memory allocator have been pinpointed to be particularly bad.  Work is underway to fix these in Ruby 1.9 (bytecode) and 2.0(threading).  In the mean time, consider Ruby Enterprise Edition in tandem with Passenger.  Personally, I’d rather avoid Ruby and all you kool-aid drinkers (but I’ve done a large deployment of Passenger).  Go Python :).

Python

Python is just a plain good language.  With that out of the way, like all the other scripting languages, Python is supposed to be getting a bytecode implementation sooner or later.  Psyco can yield an average 4x performance improvement and is available now.  PyPy should be here sooner rather than later.

Java

Due to the Java language design, code is JIT (Just-In-Time) compiled and you don’t have the compilation problem that the dynamic languages above do.

Java web apps are immensely complex, and aside from the latest JDK (1.6.0.10), your container will play a big role in speed.  Jetty and Tomcat are always good choices.

Databases and Database Caching

A large portion of modern web applications are database driven.  To keep your site running, this point of contention must be addressed.  MySQL is ubiquitous and known for its speed.  PostgreSQL offers some advanced features and is known as the DBA’s FOSS database.  If you need extreme scalability, consider DB2 but prepare to pay dearly :-).

MySQL

MySQL comes configured fairly well out of the box in most distributions.  MySQL Performance Blog sums it up better than I can, so head that way for basic tuning info.

Probably one of the easiest things you can do is enable the integrated query cache.  The good news is your application doesn’t need to do anything to take advantage of this.

in my.cnf:

query_cache_size = 64M

For single server web workloads, this simple change can work miracles and prevent dreaded MySQL connection errors.  This is especially true since web apps are primarily read oriented.  The query cache isn’t perfect in all situations, and in larger sites memcached is more appropriate but has its own disadvantages (see memcached section).

PostgreSQL

PostgreSQL should also be set up fairly well by your distribution.  shared_buffers should probably be tuned, as well as max_connections.  See the PostgreSQL wiki on tuning for a good overview.

There is nothing strictly akin to the MySQL query cache, for better or worse.

Applications and Application Caching

This is potentially the hardest step to implement, yet can also yield the greatest reward.  Caching common database queries, objects, modules, or even writing static HTML versions of a page can cut server load to nothing.  If you are using a common FOSS (free) or COTS (commercial) product, chances are the software already implements some of these options and they may just need to be activated or downloaded as an extension.

Keep in mind not all things are effectively cached, and you may need to perform a major rework to implement aggressive caching like this.

Generic Data Caching – memcached and APC

Many common applications contain backends for caching against memcached or APC.  Mediawiki is a prime example of this, which integrates nicely with memcached or APC.  If you are writing your own apps, using a memory cache can greatly reduce dependency on the database.

memcached

Realizing that databases have a lot of constraints, the folks at LiveJournal.com wrote a generic caching framework called memcached.  Most large sites such as Facebook, Wikipedia, and Slashdot are all using this.

The bad news is you have to port your application to store and check against memcached.  Database queries are a prime target, but just about anything can be stored here.

It is also handy for scale-out because you can add dedicated cache severs.

APC user cache

PHP APC users can manually store information in APC’s shared memory.  This is ideal for single server solutions.  Take a look at this performance comparison vs memcached and files.

Application Caching

Although most pages are dynamically generated these days, a lot are needlessly so.  For example, a content management system might include a header, content, comments and a footer.  This output can be updated and written as a static HTML pages when an author updates them.  Static pages are then served until a user comments on an article, which triggers a cache invalidation and the page is rendered and stored again.  The output of generated menus, columns, and other objects can be stored in cache form as well.

WordPress Cache Plugins

WordPress has a couple of plugins that are mandatory for large sites.

WP Super Cache will generate static HTML files of posts on your blog.  They are automatically served via some mod_rewrite magic, and will expire and update automatically.  This can effectively reduce  load to almost nothing – it completely eliminated database access and PHP execution.

WP Widget Cache is a nice addition that will cache output of widgets (sidebar elements such as menus) that don’t commonly change.

Benchmarking

It is important to benchmark your site after making changes to see if it meets performance expectations.  ab is a common tool for this task.

The following will run 10 concurrent requests for 3000 total against localhost:

ab -c10 -n3000 http://localhost/

Be very careful when benchmarking a live site.  You could effectively Denial of Service your server while it is processing all those requests.

What do you think?

I’d be happy to hear your stories from the trenches.  Please share your tuning advice!

Tux3 by Christmas?

Development seem to be going well for Tux3.

Daniel Phillips of Tux3 just posted the following to the LKML:

The big goals for Christmas (this Christmas!) are:

  • SMP locking
  • Atomic commit
  • Posixly complete
  • Rudimentary fsck

With the following comical reference

With atomic commit, we will progress from “buggy Ext2 equivalent with missing features” to “buggy Ext3 equivalent with missing features”.

Not a bad place to arrive at in five months, starting from scratch. Does anybody out there still doubt that the community process works, and is the best way to develop really complex software? Believe it.

See the whole post here: http://lkml.org/lkml/2008/12/10/358

Retrocomputing for Fun and Profit

  1. Buy Old Computers
  2. ???
  3. Profit

What is retrocomputing?

I define retrocomputing [wikipedia] as the collecting and use of old computers.  Why might one do this?  Well, for one, enterprises cycle out machines fairly frequently.  2,3,4 and 5 year old systems are often sent out to scrappers in droves despite still being plenty useful.  Top of the line systems for large companies often have more than enough power for small and medium sized ones at pennies on the dollar compared to new hardware.  These machines are likely complete overkill for home use, but none the less are very useful for fun and learning.

IBM mainframe ops in the 1980s

Why?!

A lot of what I know about computers has been learned on old machines.  Hooking up a couple of servers and desktops and trying to make something useful out of them is a great exercise for the aspiring system administrator.  With open source software, it can all be done freely and easily.

Yes, you can run Linux, BSD, and Solaris from the comfort of your Windows desktop in a virtual machine (weak sauce…).  Yet there is something much different when you cluster several high technology servers together, tethered to a Fibre Channel storage array and have them share a single distributed file system.  The knowledge of setup, installation, and troubleshooting I’ve gained from mock scenarios like this I cannot compare to anyone else I’ve ever met.  Breaking things here usually means digging deep and fixing it.  If you were to screw something up at work like some of the things I’ve gotten into, it would probably cost you your job.

BENCHNET - where I rip into computers that cost as much as a house and my "production" rack

Retrocomputing is also fun.  I am personally into old IBM hardware, though old UNIX workstations of all sorts are interesting to me.  You can see my collection of IBM PS/2 and RS/6000 knowledge here: http://ps-2.kev009.com:8081/.  There is a particular thrill to booting up a machine that cost between $20,000 and $50,000 10 years ago.  Knowing that these same machine models were used to design the Boeing 777, composed the famous Deep Blue machine, and were used in the largest automotive and shipbuilding firms not to mention some of the most important space craft to date also brings a sense of power and nostalgia.  In some ways its similar to having a classic car, but different.  Maybe if that classic car was a big ass bulldozer, tank, jet or some other well engineered piece of equipment :-P.

Some old systems I had at one time or another.  Left to right: IBM PS/2e (first "green" environmental pc), RS/6000 43p (7043-140), Apple PowerMac 7100/80, RS/6000 7006-42W, RS/6000 7012-397, HP Visualize c360 (PARISC)

IBM PS/2e, RS/6000 43p, PowerMac 7100, RS/6000 x2, HP Visualize c360

Nostalgia is one of the biggest things I get out of using particularly old hardware.  I missed the mainframe days, the minicomputer days, the PC and DOS days, the Apple II days (well, actually I used these a bit at a very young age), and to a degree the early Windows days.  Just like a history class, studying these old machines gives me insight as to why things are done the way they are today.  It gives me appreciation for modern systems and makes me write clean and well optimized code.  The old computer games that captivated me as a child (Sim City, Sim Tower, Sim Ant, Sim Farm, Gizmos and Gadgets, The Incredible Machine, Oregon Tail etc.) implanted a high degree of logic and understanding at a young age and it is heartwarming to revisit these.  I grew up a Mac user as well, so seeing what I was(or: was not :>) missing on PCs is also interesting.

Old MIPS UNIX server booting and logging in

Some of the benefits of retrocomputing:

  • Enterprise class hardware
  • Cheap, possibly even free
  • Different design philosophies – not everything is x86 – a lot of this gear is quite different.  For example, UNIX workstations integrated most of what we enjoy on our PCs years before it became available to consumers.  SGI machines were doing A/V and 3D in the early 90s.  IBM midrange AS/400s have an advanced integrated database, programming languages, and environment that make PCs look like a joke for business programming.  WinFS, Object Storage Devices, etc are just now being talked about for PCs.  The channel philosophy from mainframes is still pretty new to PC servers (fibre channel), not to mention virtualization.
  • If you break it, you can fix it and learn from it or toss it
  • The engineering and craftsmanship in some of these systems is downright astonishing
  • Old computers are works of art: they give you a window into the technology and culture of times past
  • You should never trust a computer you can’t lift

It is interesting that we as humans produce such elaborate machines, only to discard them as scarp a few years later.  It is humbling and shows you the incredible progress we are making.

How?

eBay is your friend, but also look for local scrapyards or businesses doing overhauls.

If you are faint of heart, plenty of good abandonware sites exist for games and operating systems that can be run on emulators or VMs.  Check out this IBM mainframe emulator, Hercules.  Some of the original IBM OSes are public domain.

If you don’t want old PCs and big iron overtaking your house, there is plenty of good material on YouTube as well.  The Computer Museum is a good start.  Some of the consoles, offices, and outfits are hilarious.

Old SGI tech demo – pretty impressive!