Phoronix Benchmarking.. Statistically Significant? and Other Performance Concerns

Phoronix has been cranking out a slew of benchmarks recently, pitting various different Linux distros against each other and even different operating systems with their own automated test suite.

  1. Ubuntu 7.04 to 8.10 Benchmarks
  2. Mac OS 10.5 vs. Ubuntu 8.10 Benchmarks
  3. Ubuntu vs. OpenSolaris vs. FreeBSD benchmarks
  4. Fedora 10 vs Ubuntu 8.10 Benchmarks
  5. “Real World” Benchmarks of the [sic x2...]EXT4 File-System
  6. OpenSolaris 2008.05 vs. 2008.11 Benchmarks

What I would like to know is… are they bullshit?  I’m no statistician, yet the proximity of the numbers and lack of error bars raise my own bullshit detection meter.  See this URL for some background on statistical significance and error bars: http://www.graphpad.com/articles/errorbars.htm.

I spot plenty of outlandish things, such as FPS benchmarks for file system tests and JDK version changes through product life cycles, also to mention the somewhat unfairness of crappy binary graphics drivers across versions.  JDK 1.6.10 is SUPPOSED to be faster; results from benchmarking against this are insignificant unless it is run across all versions.  Yes.. GCC, glibc, and the kernel change between releases as well but these are not typically components that a user can swap out as easily as a JVM which should probably be bumped for security reasons anyways.

Furthermore, I realize all benchmarking should be taken with a grain of salt – one particular set of hardware and software will never map correctly to another set of hardware or software, but it should be possible to set up tests to gain some useful intelligence.

Can this kind of macro/micro benchmarking (depending on how you look at it) help weed out regressions?  GCC 4.0 was noticeably slower on x86 than 3.x (See: http://www.coyotegulch.com/reviews/gcc4/index.html, http://people.redhat.com/bkoz/benchmarks/).  At the same time I think PowerPC saw significant improvement due to auto vectorization and use of Altivec/VMX.   But it also seems to be improving over time.  I’ve heard 4.4 is supposed to be much better with a new register allocator (IRA).  This probably the most important component of modern open source operating systems, so some of the blame might be placed here if the numbers have meaning.

All of this makes LLVM look more and more appealing.  LLVM is able to do not only compile time, but also link and run time optimization.  This is very appealing for commercial software where you are given a binary blob by the manufacturer that will likely that will not change through its lifetime.  It also reminds me of Java and speedup through JVM upgrades, except this should work on any language.

“LLVM is… designed to enable effective program optimization across the entire lifetime of a program. LLVM supports effective optimization at compile time, link-time (particularly interprocedural), run-time and offline (i.e., after software is installed), while remaining transparent to developers”

One thing the Phoronix numbers do show is that things seemed to go down hill coincident with CFS (Completely Fair Scheduler), dyntick, and SLUB merging as well.

Evgeniy Polyakov of POHMELFS fame raised the alarm with some fairly significant networking regressions – how financial crisis affects tbench performance – that seem to support a general slowdown between 2.6.22 and 2.6.27.  This resulted in noise on LKML and hopefully we will see improvements soon.

I guess what I am getting at is that compute power is so cheap that it seems stupid to not have automated tests against such things these days.  Diego Petteno of Gentoo fame has been doing such things recently with Gentoo’s excellent build system.  I have set up Hudson, a Java Continuous Integration system, before to track commit regressions and such a system seems ideal for all modern software testing.

Anyways, I am interested in hearing your thoughts on benchmarking, software testing, and automation and how it can be used to improve modern software.

KDE 4.2 beta 1 on Gentoo

KDE 4.2 is set for release on January 27th.  Eager to see what is new and improved, I installed beta 1 on my Gentoo box.  KDE 4.1.80 was tagged then released late last month after the feature freeze deadline. This is a snapshot of the current development tree that will eventually be released as KDE 4.2.

A quick blurb about KDE 4.1:
Busy with other things, I never wrote about KDE 4.1 on Gentoo. I’ve been using KDE 4.1 since 4.1.1 with the kde-testing overlay which has since entered portage and been bumped to 4.1.3. This is a good release and well worth at least testing if you are a KDE user. I’ve found it to be quite stable and use it exclusively for every day work.  The Gentoo KDE team has done an excellent job creating ebuilds for it.

What’s new?

Plasma received a lot of polish and is beginning to eclipse Kicker and indeed all other desktop and panels that I have used.  Much needed features such as changing the panel height, auto-hide, and screen edge selection have been added.  The task bar is highly configurable in typical KDE fashion, allowing you to define task grouping, sorting, filtering based on current desktop or screen or minimized windows only, as well as allowing manual grouping.  The system tray also now allows hiding of unwanted tray icons.

Here’s a screen shot cluttered with various plasmoids for demonstration.  It’s nice to see the community thinking up some fun and useful plasmoids.

I wasn’t the biggest fan of the KDE 4 default menu.  Luckily, the Lancelot menu has been accepted upstream and is now an option on stock installs.  This menu is great for finding new applications (esp. new users) as well as thumbing through with the keyboard.

I’m a Firefox user, but occasionally will fire up other browsers for testing or to avoid restoring a large previous session if I am in a hurry.  I’m happy to say that Konqueror feels much faster.  It also seems to work much better on AJAX heavy sites such as Facebook.  When I spoofed the user agent to report Firefox 2, Facebook chat worked fine, an improvement from 4.1.  The continued merging of Webkit is clearly beneficial here.

One of my favorite KDE apps from 3.5, Ark, is also finally reaching feature parity.  I missed shell integration with Dolphin/Konqueror quite a bit and am happy to say it has returned.

Notifications are displayed and stack nicely in the lower right corner.  Operations such as downloading and moving files will show their status here.

Kontact gained usenet support by means of Akonadi, which now has support many for data sources.  I think I will switch from Thunderbird/Lightning to Kontact with this release.

Kate has a new VI editing mode.  This is quite a nice text editor.

Amarok is shaping up as well.  This is 2.0 RC1, so it should be released on a date close to KDE 4.2.  Take a look at the different Internet media sources.  Last.fm support is now top notch!

Digikam 0.10-beta5, which seems to be stabilizing and evolving nicely, is another nice app.  Bonus points if you can identify all the retro machines.  The one on the left was probably the worlds first “green” PC.

Okteta, an easy to use hex editior, has also been updated.

I could go on and on showing the great progress.  I hope I hit the highlights, but you can check out the feature plan for yourself here: http://techbase.kde.org/Schedules/KDE4/4.2_Feature_Plan.  Noteworthy changes include improved multi-display, better desktop search with Strigi, and integrated power management.

Gentoo Installation

First, a shout out to the Gentoo KDE maintainers and testers.  Creating ebuilds for fast moving snapshots and live sources for a project this large is not an easy task.

Installation on Gentoo is fairly easy if you have layman.  ‘layman -a kde-crazy’ will add the KDE crazy overlay which has KDE 4.2 unmasked and ready for testing.  If your box is ~arch, it should be as simple as ‘emerge -av @kde-4.2′ (see comments below for more info).  I recommend using the kdeprefix USE flag if you wish to test development releases so you can fall back to stable if things aren’t working correctly.  This will slot 4.x releases.

If you want a stable and usable environment, I still recommend sticking to 4.1.3 at the moment.  If you run a mostly stable Gentoo with KDE3.5, you can find a package.keywords file in the kde-testing overlay as well as some other minor goodies.  These versions slot effortlessly so it isn’t a problem switching back and forth.

Conclusions

KDE 4.2 has come a long way since 4.0 and is a nice steady improvement over 4.1.  As I stated earlier, I use KDE 4.1.3 as my only desktop environment and am extremely pleased with it.  I have had no major issues and have had uptime of over a month in the past without crashing/restarting KDE – so the good old KDE 3.5 stability seems to be returning.  By trying the current beta out, I have no doubt that 4.2 should be just as stable by release.

Also, if you are an Nvidia user you owe it to yourself to try the latest 180.xx+ drivers.  As people have long been saying, much of the performance problems they were describing were related to Nvidia cards and poor video drivers.  With the new drivers, KDE is lightning fast.

Here’s to an on time and successful KDE 4.2 release.  I can’t wait to see what QT 4.5 and KDE 4.3 will bring!

More Linux File Systems

It seems I caught the wave of interest in Linux file systems.  Here are some articles worth checking out:

Of course, there are some other file systems that I haven’t talked about that came up in the comments of my last post.  Most of these are special purpose, still on the fringes, or different in scope than the first article which was about local storage.

  • The native flash file systems: UBIFS and LogFS.  In theory file systems like these would be ideal for SSDs, but we need manufactures to stop putting FAT/Hard disk emulation and wear leveling into their drives.  Windows is the culprit for this.  These file systems also seem keyed toward embedded device flash memories at the moment and not general purpose storage.  Neither are upstream yet.  This could be positive, as the disk format and code can be readily changed to make them compatible with future SSDs if there is a change in manufacturing.  Val Henson, a Linux file system authority, has some interesting thoughts.
  • The Log file systems: LogFS (flash centric, see above), NILFS.  I believe ZFS and Tux3 share design philosophy from these.  The idea has been around for a long time but none have ever really succeeded.
  • The shared disk and distributed parallel file systems: OCFS2, Lustre, GFS, PVFS.  There
    are a laundry list of these.  Some implement entire disk file systems, while others add clustering or distributing properties to other file systems.  SUSE [link] and I think Red Hat even considered using one as the default file system but booting (Grub) is an issue.
  • Network file systems: these add networking to other disk file systems. POHMELFS and CRFS (distributed extension for Btrfs) are interesting new ones here.  Of course there are a laundry list network file systems for Linux.  NFS, AFS, and CIFS are the old timers.
  • The others: These are more experimental or research oriented.  chunkfs, spadfs, and many more.  Many others just don’t have steam behind them yet or are dead in the water.

My overview is clearly Linux oriented, though I mentioned ZFS in passing because I think it spurred a lot of these recent developments.  That isn’t to say the BSDs are sitting still with HAMMER, and FreeBSD is keeping UFS2 moving forward while NetBSD has LFS – a log file system.

P.S.:

I plan on performing some benchmarking soon after 2.6.28 goes stable.  The list will include ext2/3/4, JFS, XFS, Reiser3 and Btrfs.  The setup will include single and multi-disk configs.  If you have any requests or suggestions for setup, please contact me.