My thoughts on software and complexity

My thoughts on the growth of the Linux kernel and the status quo of using and developing software..

Prompted by discussion of this article: 1986 Mac Plus Vs. 2007 AMD DualCore. You Won’t Believe Who Wins

[Ed: My response to accusations of Linux Kernel bloat]

The [Linux] kernel never has really been the problem. In 1 to 2 MB of compressed/compiled code on my computer (gentoo-sources + my custom .config and a couple of patch sets from future merges), there is some of the most advanced file system, networking, protocol, hardware and scheduling code ever conceived. Indeed, there are many areas that need work and are constantly being updated, but find me a kernel that supports NUMA, scales quite linearly with SMP, implements fair queuing of IO and CPU scheduling, has NO tick interval, virtualization and supports a wide gamut of platforms and hardware. It runs on systems as small as a microcontroller and as large as BlueGene/L. Did I mention it is free and I can learn from and hack on it?

The kernel isn’t really expanding at a rate to be concerned with, because only a small subset ends up being needed for most users and systems. No, the problem really lies in user space on UNIX systems. Modern UNIX userland involves many many layers of programs interacting and building on top of each other. I really don’t see it getting better in the future either. As higher and higher levels of programing languages are being used, more and more layers are added to the onion. This can make a programmer’s life easier and allows more complex systems to be designed, but there are many drawbacks as well. Bug creep, feature creep, usability, complexity, and resource usage all come to mind.

Do I know the answer? Not at all. I don’t think there is one. Software will develop organically in the wake of hardware progress for the foreseeable future. If and when this progress slows, perhaps things will change course. A sea change of compiler optimization, small is beuatiful engineering, and an emphasis on efficiency..

Syncing Directories with Multiple Computers

I have a laptop, a workstation, and a server at home that I use daily. I also have a collection of code, documents, and music that are useful to have locally on all three — especially the laptop when traveling. For a while I would just copy files over the network (NFS and CIFS), or use the server but this has gotten tedious as the amount of data has grown. So I went looking for a syncing app.

I had two requirements:

  • It must not run as a service – I don’t want yet another program loading at boot time or hogging RAM. Transfers should use NFS. Preferable since I am already running it and for speed.
  • It must be initiated from the client side – I don’t want to SSH into computer X to push or pull an update

Unison

Unison has received a lot of hype. It certainly shows a lot of potential. Amongst notable features, it supports bi-directional syncing. You can have changes in the local and remote directories and interactively merge them. It also has a GUI for simplifying this and handling collisions. During my test, performance was terrible and it crashed during the merge. I read somewhere in the FAQ that using Unison with NFS isn’t ideal. It prefers to use SSH or a socket, which goes against one of my requirements. In short, this is a program to watch and it may be suitable for smaller file sets — but it simply did not work well here. The final strike is that it is no longer a research project and only lightly maintained.

rsync

rsync is a very powerful program. Its method of file transfer minimizes I/O and is fast. It is essentially unidirectional, so it lacks the power that Unison has in this regard. For my use case, this won’t be a problem since updates happen on one system at a time and are usually pushed to the server. Gentoo users will be all to familiar with rsync when updating portage with ‘emerge –sync’.

Using rsync is simple. ‘rsync -av SOURCE DESTINATION’ will be suitable for most jobs. A quick view of the man page will give a rundown on the options. Using a hub and spoke topology, I can sync all three in any direction: laptop <–> server <–> workstation.

I’m not sure why I waited so long to do this. It is incredibly efficient and easy and I recommend a similar setup if you have two or more computers. It nicely complements a version control system for things that don’t need to be tracked over time.

Cheap fetchmail Trick

I’ve had a pretty advanced email system running at home (Postfix, SpamAssassin, Dovecot, Fetchmail) for over a year. The original goal was spam filtering for some 10 year old POP3 mail accounts from my old ISP that were heavily spammed. I couldn’t be more pleased with the system; false positives [not spam] are next to zero and very few false negatives [spam] slide through. I might even write a guide for doing a similar setup in the future, but not today.

One of the kinks along the way involved fetchmail. There was an email account that I needed delivered to two separate local user’s inboxes. fetchmail and alternative getmail have no way of doing this natively from what I could tell. You simply can’t have the same mail account twice. The question on IRC brought about strange solutions such as using procmail or aliases, but there were several disadvantage I could think of, especially involving the per user Bayes learning provided by SpamAssassin.

So I got to thinking, what if I simply fooled fetchmail into thinking that the account was actually two different accounts or servers. And here it is, .fetchmailrc.

poll bowling2-kev
uidl
proto pop3
auth password
via pop.isp.tld
user “bowling@isp.tld”
pass “supersecretpassword”
keep
is kev009

poll bowling2-bowling
proto pop3
auth password
via pop.isp.tld
user “bowling@isp.tld.”
pass “supersecretpassword”
is bowling

There are two key things going on here. The first account uses the keep attribute, similar to the “leave mail on server” option in client software. The interesting bit is in the second account. If you look on the line beginning with ‘user’, there is a trailing period. I am simply taking advantage of the fact that fetchmail thinks this is a different host. In reality the final period is the FQDN (Fully Qualified Domain Name), including the root DNS zone so it is the same host.

I speculate that if entries were added to the /etc/hosts file, this could be scaled to more than two users. Also, uidl and keep across all the accounts could prevent the chance of a mail going to only the final fetchmail entry, if it arrives between fetches. The disadvantage would be overflowing mailboxes if they are not externally pruned, and longer fecthes.

So there you have it, a quick and dirty hack for fetchmailing one remote account to two local users.

I Love IBM

Just taking a moment to express my appreciation for IBM. Yes, I’m a bit of an IBM fanboy.

POWER6 is geared for release, and it’s badass. Check out this Ars Technica article for a good run down:
IBM’s POWER6 flies the coop at 4.7GHz

Simply put, IBM is the only company competing with Intel on Silicon process technology. IBM is part of an alliance with Freescale, Chartered, Samsung, AMD and others so their innovation can and will trickle down to the consumer market.

This is what I’m talking about:
Made in IBM Labs: 10 Chip Breakthroughs in 10 Years

Thank you.