Archive for the ‘Linux’ Category

I dream of pervasive virtualization…

Friday, January 2nd, 2009

I dream of a day where virtualization is pervasive.

Instead of thinking about services in terms of servers, CPUs or directly mapped resources, I should be able to to add virtual machines in terms of guaranteed throughput rate over a whole grid.  Scaling out should be as easy as adding a blade or racking another server.

At the low level, I should have the option of running N+N redundancy.  That is, the VM should run in lockstep across multiple machines - so if it is running on 2 vcpus, 4 in total would be used.  This would allow for any node to fail.  And the VM should be an aggregate of the low level hardware - e.g. a VM grid across 4 8-core servers should scale near-linearly when a single OS instance is running 32 processes.

Current solutions only attempt to do some of the tasks above, and most fail miserably.  IBM mainframes have been doing it for ages.

If I had the time, I know I could build software to do this better than anyone else.  All the puzzle pieces are there, especially the tough ones like hypervisors and Infiniband.  This could have been done at least 3 years ago.  I bet it will take the industry 3-4 years yet to get anywhere close.

This is a real virtual datacenter.

Xen 3.3 in RHEL/CentOS 5 and more Link Aggregation Fun

Thursday, January 1st, 2009

RHEL 5 includes the now ancient Xen 3.0 hypervisior.  A lot has been improved since then, especially in the current 3.3 release.  Additionally, RedHat now owns the company behind KVM, so it is unlikely they will spend much time backporting Xen stuff for RHEL 5.3 or the likes.

Why Xen?

Xen is a proven hypervisor.  It works well on lots of hardware, including servers without hardware virtualization and older 64-bit Opterons that wont run 64-bit guests in the likes of VMWare.  Since the OS is usually paravirtualized, performance is top notch.  By making an OS aware of the environment it is running in, you can optimize it for virtualization.  KVM is playing catchup here, realizing that paravirtualization is still ideal for many things.

How..

Okay, so we are using or want to use Xen. Others have already built the packages we need, thankfully!

Head over to http://www.gitco.de/repo/ and grab the repo for your arch.  (Most likely wget http://www.gitco.de/repo/CentOS5-GITCO_x86_64.repo in /etc/yum.repos.d/ for the uninitiated).

If you already have Xen installed, you may need to remove and readd it.

yum groupremove Virtualization
yum groupinstall Virtualization

You’ll also get some updated tools like Virtual Machine Monitor 0.6.0 that make it easier to install newer guests such as Fedora 10 or Ubuntu.  Sweet!

Double check /etc/sysconfig/kernel.  It should be set to kernel-xen.  Likewise, check /boot/grub.conf and make sure that the Xen kernel is the default if the aforementioned was not done beforehand.

Reboot!

Xen 3.3 and Link Bonding

See my previous post for general information, but it gets harder.

This one is a nightmare.  In my previous post, I detailed how to get Xen to work with link aggregation with Xen 3.0.  Well, it doesn’t work in 3.3.  Xen decides that it still owns eth0 and completely destroys your bond0 setup.

Like these people, I’ve come to the conclusion that the integrated network scripts suck.  This is alarming since you’d think link bonded setups would be the norm for Xen setups.

The quick fix is to let the OS handle networking.  We do that like so: add a br0 interface and tell the bond to bridge with it.

File /etc/sysconfig/network-scripts/ifcfg-br0

DEVICE=br0
ONBOOT=yes
BOOTPROTO=none
IPADDR=10.0.6.201
NETMASK=255.255.255.0
GATEWAY=10.0.6.1
NO_ALIASROUTING=yes
TYPE=Bridge

Then, edit your /etc/sysconfig/network-scripts/ifcfg-bond0 and add “BRIDGE=br0″ and comment out any IP related information (since you are now defining that in the bridge.  Head over to /etc/sysctl.conf and add:

net.ipv4.ip_forward = 1

Now, edit your Xen VMs in /etc/xen/ or /etc/xen/auto and change xenbr0 to br0:

vif = [ ‘mac=ee:cc:aa:88:66:44, bridge=br0′, ]

Okay, now disable the Xen networking garbage.  Open /etc/xen/xend-config.sxp and comment out anything  that looks like (network-script ….).

Almost done, but wait!  RHEL 5.2 has a bug that prevents the bridge coming up on a bonded interface.  Hopefully this will make the 5.3 cut or be pushed to 5.2, but until then go here.  Download the new patch into /etc/sysconfig/network-scripts/ and run patch -p0 < ifup-eth.patch for instance.

Finish

Reboot.  You now have Xen 3.3 goodness on a big Ethernet channel!  Post a comment if you have any trouble or questions.

Link Bonding Craziness in RHEL/Centos 5

Tuesday, December 30th, 2008

I just went through hell in a handbasket trying to get 802.3ad Link Aggregation set up on a Centos 5.2 Xen box.  Setting up link aggregation itself isn’t that bad - http://wiki.centos.org/TipsAndTricks/BondingInterfaces for a simple guide (after your managed switch is configd) - but what ever I did, I was unable to get both interfaces simultaneously active.

About the only useful debugging info I got was that the MAC was in use.  I was puzzled because as far as I know, link agg takes over the primary MAC and sets that up for both NICs.  Furthermore, the same exact hardware was working great on Fedora 10.

bonding: bond0: Warning: the permanent HWaddr of eth0
 - [MAC ADDR]- is still in use by bond0. Set the HWaddr of eth0
to a different address to avoid conflicts.
bonding: bond0: releasing active interface eth0
bonding: bond0: making interface eth1 the new active one.
bonding: bond0: Removing slave eth1
bonding: bond0: releasing active interface eth1
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave eth0.

Luckily, I stumbled across this bug report.  If you scroll down to the last comment, this appears to be a Xen specific issue.  By default Xen tries to set its bridge up on eth0, and I assume this prevents the kernel bonding driver from taking over the NIC.  By opening up /etc/xen/xend-config.sxp and adding:

(network-script 'network-bridge netdev=bond0')

Xen will bridge to the bond0 interface, and everything will work as expected.

Another trick I had to do was add a start delay to the networking scripts.  This is useful if your hardware is crap (cough Broadcom), you need a dhcp lease and it fails, or you are running STP, link aggr., etc.  On Fedora, RHEL, and derivitives this is accomplished by adding the NETWORKDELAY directive to /etc/sysconfig/network:

NETWORKING=yes
NETWORKDELAY=31

If you need more granularity, you can set delays to specific adapters in the /etc/sysconfig/ifcfg-{x} files with the LINKDELAY directive.

Just a couple of hard lesssons from the trenches, hopefully this will save someone else some time.

Political correctness in open source doesn’t matter

Thursday, December 25th, 2008

See: http://www.itwire.com/content/view/22467/1090/.  To answer the sensationalist title, no.

I can’t believe people are trying to make this an issue.  I guess it was only a matter of time before the crap that we deal with in the rest of the world met up with open source.

I have seen several prominent developers on just the kernel that just happen to be women.  Many more on large projects like KDE.  Great, big deal.  They shouldn’t receive special privileges, recognition, or anything because of it.

“The strange thing about this episode is that it looks like the FOSS community seemingly doesn’t want to know about it.”
No shit.  This kind of crybaby attitude is why governments and large corporations can’t get anything done, too worried about offending people.  Most of us FOSS people are here for the goods, not to set up bureaucracy, politics, and political correctness.

This Gentoo developer is spot on: http://steveno.wordpress.com/2008/12/17/mad-gnu-women/.  The existence of women’s only groups like Debian women are wrong in the first place and harmful.  In the same category as Richard Stallman - well intentioned but counter-productive.

Isn’t the goal to write and use good software?  Gender has nothing to do with that.  Neither does race, color, or being the stereotypical guy that spends countless hours hacking away in the parent’s basement.  Yet we think nothing about laughing at the last.  The world would be a better place if people just grew thicker skin.

How to upgrade to ext4 in place

Wednesday, December 24th, 2008

Here’s how you upgrade to ext4.  The process is pretty easy, but requires an fsck which means unmounting or rebooting if the file system is in use.

Make sure you are using at least e2fstools 1.41.3 and kernel 2.6.28 (or a vendor kernel with latest ext4 patches applied)!  Also, its probably a good idea to have proper backups (really!).  ext4 has just been declared stable, but what that really means is that the battle hardening has just begun.  I’ve done several heavily used systems without fault so far though, so its probably good enough for your desktop.

WARNING: DON’T CONVERT YOUR /boot PARTITION. Right now, there is no stable version of grub with ext3 support.  Even if there was, it really won’t gain you anything  :-) .

Run tune2fs, e.g.:

tune2fs -I 256 -O sparse_super,filetype,resize_inode,dir_index,ext_attr,has_journal,\
extents,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize /dev/sd[x][n]

Those are the default options for an ext4 file system if you were to create it with mkfs.ext4 (e2fsprogs 1.41.3 - see /etc/mke2fs.conf).  I’m getting pretty damn good performance with this!  The ‘-I 256′ option sets 256 bit inodes, which most recent ext3 FSs use already. If this is the case, and you get a message telling you so, remove this option.  Note that extents will make the FS backwards INCOMPATIBLE with ext3.

Next, edit /etc/fstab, e.g.:

/dev/vg/home /home ext4 defaults 0 0

Either unmount and mount or reboot your system.  tune2fs marks the fs as dirty and performs a fsck and conversion.
NOTICE: distros with initrds may need to be regenerated or you won’t be able to mount your root file system.  In Fedora (replace kernel version with your own):

cd /boot
mv initrd-2.6.27.7-134.fc10.i686.img initrd-2.6.27.7-134.fc10.i686.img.old
mkinitrd initrd-2.6.27.7-134.fc10.i686.img initrd-2.6.27.7-134.fc10.i686.

That’s all there is to it.  Stay tuned for future ext4 developments like online defragmentation.

Also, ext{2,3,4} reserve 5% of space for root in case the drive fills up.  On large modern drives, this can be excessive (e.g: 50GB on a 1TB disk).  Consider running ‘tune2fs -m 1 /dev/sd[x][n]‘ to reduce this to 1%.

For more information and tweaking:

  1. Documentation/filesystems/ext4.txt from the latest kernel sources
  2. http://ext4.wiki.kernel.org/index.php/Main_Page
  3. man tune2fs
  4. http://e2fsprogs.sourceforge.net/

NIN Echoplex live rehearsal

Friday, December 19th, 2008

One of my favorite blogs, Create Digital Music, ran a piece on NIN doing a rehearsal for Echoplex.

It’s neat to see the Lemur, a touch screen MIDI controller, used live by the drummer.  Overall the song structure is relatively simple but it is amazing how well they keep together.  Notice the dude manning their monitoring setup full time and the in ear monitors 8-)?

I need to get a hold of an old touchscreen PC and see if I can convert it into a Linux MIDI controller with Open Sound Control and maybe some custom Java.  It’s sad because I had something perfect for the task a few years ago, an IBM Industrial panel PC, basically a laptop in a bulletproof magnesium case with touchscreen :-(.

If you are interested in this sort of thing, click through, otherwise enjoy the tune below.

http://createdigitalmusic.com/2008/12/17/how-they-work-nin-echoplex-rehearsing-live-with-lemur/

Also, check out Linux Journal’s Java Sound and Music three part series.


NIN: Echoplex - Live at Rehearsals, July 2008 from Nine Inch Nails on Vimeo.

Linux and the PC Mentaility

Wednesday, December 17th, 2008

Here’s a minor thought dump of the moment: no hard numbers, just some general observations over the past year or two.

I’ve heard the comparison between Linux and other UNIX like OSes a few times in the past.  Of course it is a bit hard to compare “Linux” - which is a kernel - and the others which most often are complete operating systems.  Many times they even share a lot of GNU userland tools, so I will disregard userland.

The general feel I get from looking at some parts of Linux and its development process is that of the PC world.  Build cheap, throwaway solutions and rip and replace later.  Looking at things like the scheduler, FreeBSD was able to grow similarly advanced solution in a much more organized fashion without ripping and replacing this critical component several times and has far less developers.  *BSD has a more traditional UNIX process from the days of the minicomputer and professional workstation, while Linux is deeply rooted in PC culture.

Aside from the scheduler, there seem to be other places where development has tarnish before shine.  hrtimers, dyntick, CFS, and SLUB are some of the more controversial moves in the past year or so.  Indeed, kernels 2.6.23+ didn’t “feel” as polished as the few leading up to it and “felt” less responsive to me when playing audio or video concurrent with other tasks like compiling - something CFS was supposed to fix, but I never had a problem with before.  Going into 2.6.28, these big changes are starting to settle down.  Things once again “feel” better, more stable, and better performing - so I guess it is just a phase.

From a software engineering standpoint, it is interesting to ponder how to improve the process.  One thing Linux has that many other projects don’t is an enormous amount of contributors.  A lot of people and companies are trying to different things with the kernel.  It is important for them to get their code into the kernel as soon as possible, because it is such a fast moving process, but new features lead to bugs.  Linux-next appears to be helping with keeping in progress trees so that developers can explore ideas before pushing to mainline, and Adrian Bunk’s idea of the “long time support” kernels for those of us that don’t run large distro kernels seem like a step in the right direction.  I do wish he would forgo 2.6.27 and base it on 2.6.28 for the next LTS though.  2.6.27 has had quite a lot of major patching done since release and as I said earlier, 2.6.28 just looks better so far.

Another point: technically superior solutions don’t always get favored.  I’ve said before that I think there were better choices for file systems than ext3 for the past 8 or so years.  CFS vs. RDSL is another example.  The latest is SCST vs. SGST (http://lwn.net/Articles/311344/).  SCST is clearly the only solution for the enterprise where Fibre Channel and Infiniband are likely to be used.  And Ingo and Thomas are at it again reinventing the wheel with performance counters.

The counterargument is that Linux runs on most of the worlds supercomputers all the way down to small critical embedded systems.  Most of these are running obsolete kernels with heavy vendor backports though.

Running the latest kernel is fun; after all I am a Gentoo user so I don’t mind when things break, but for truly mission critical tasks I would not be afraid to look outside of Linux to BSD.  Ironically, Solaris, once the stalwart of enterprise, seems to be embracing the Linux mentality.

USB 3.0 on Linux

Sunday, December 14th, 2008

Take a look at this Intel developer’s blog:

http://sarah.thesharps.us/2008-12-07-13-35.cherry

The video shows a USB flash drive transfering video at 125MB/s!  To give you an idea of this speed, it is likely more than your hard disk can put out which is probably around 80MB/s.  The rest of the article gives a good overview of USB 3.0 and she states that the bus should have about 400MB/s bandwidth in the real world.  This is a breath of fresh air for external devices of all sorts, and I can’t wait to see this on new computers.

The Linux drivers are currently under development, with the subsystem patches going into review.  The xHCI driver will have to wait until Intel finalizes and releases the specs.  Anyways, it’s safe to say that Linux should have USB 3.0 support as soon as products hit the market in the middle of 2009.

Bulletproof your server to survive Digg/Slashdot

Saturday, December 13th, 2008

implementing scale up for web 2.0 sites with current practices

This blog was recently featured on Slashdot over the Thanksgiving holiday in the US.  It was the perfect storm: commercial news organizations were mostly dormant creating a slow news day, and geeks like me were at home eager to get the latest technology scoop.  What surprised me is how this relatively modest box, a Linode 540MB Xen Virtual Machine, withstood up to 100+ requests a second without even breaking a sweat.  Furthermore, I had only performed some of the tuning I detail below.  It scales to over 1100 requests a second after following my guide below!

I will detail how to tune your server for optimum capacity, or what I will call free scale up (as opposed to scale up by adding hardware or scale out - adding machines, database servers, application servers, load balancing - which may come in a future article depending on interest).  Most of the ideas here are platform neutral - both OS and application server - assuming you are using a UNIX style OS.

The only tutorials I’ve found were dated and don’t detail the latest practices like varnish or Passenger, so read on for a fresh look.

Audience

The intended audience for this article is anyone running a web site.  Running your own web server gives much greater flexibility in choice of development environment.  A dedicated server and certain virtual private server providers give much more predictable performance and wont cancel your service on a whim. (I’m looking at you MediaTemple… google for horror stories).  A Linode VDS is much more flexible and very powerful for around the same cost.

Web Server

Most people use Apache.  According to Netcraft, over 50% of hosts were as of November.  For good reason, Apache has proven stability, scalability, and security.  Some folks are quick to rip out Apache due to poor configuration and tuning.  I personally find it to be an excellent choice for most sites because of the aforementioned traits and first-rate extensions.  With proper setup, you will likely max your transfer or tax your application sever before it ever becomes the bottleneck.

Apache Tuning

The key to tuning Apache is to minimize RAM usage, especially on a limited machine like my 512MB Linode.  Memory swapping of applications to disk is almost entirely unacceptable on modern servers.  Disk I/O is very expensive and the biggest bottleneck on modern computers, which is why swap is so unappealing.

Therefore, you need to:

  1. Limit overall Apache memory usage
  2. Minimize per thread/process memory usage
  3. Minimize disk I/O
Limit overall memory usage

Step one is very important.  If your server begins swapping heavily, it can be very difficult to even log on and perform administration.  You need to develop an idea of the RAM an average Apache process is using via top, ps, or another monitoring framework.  Make sure you are looking at the RES column in top, since shared libraries will be used between all processes.  Take this number and divide it by the amount of availalbe RAM.  Available RAM should take into account RAM used by other processes including your database when under reciprocal load.  Set the MaxClients directive to a number close to the resultant, and tune accordingly with benchmarks (see Benchmarking section).

Minimize memory usage

Step two determines how many child processes you can handle.  This is important because the more children, the more in flight requests, and lower end user latency.  This is also a lot more environment dependent than step one.

A good way to reduce memory consumption is to unload unneeded modules.  Most server operating systems default Apache with a wide range of modules that are probably not used on your site including several basic authentication methods.  Using shared objects rather than static modules will help memory usage as well, and most distributions ship this way.

If you use an Apache module for your application server (mod_php, mod_perl, mod_python, Passenger aka mod_rails), each child process will consume the memory of that module regardless of whether or not it is serving a static asset (images, css, etc.) or an application page.  Mitigate this by using a proxy (see next section) or moving application serving to its own processes via FastCGI (PHP, most others), AJP (Java, Python), WSGI (newer Python), proxy (Ruby, all).

Disable logging

I should take a moment to step back and hit on an important topic.  Hard disks have improved very little in regard to performance in recent years.  Disk I/O is an expensive task and therefore the primary bottleneck you wish to avoid.

When Apache logging is enabled, a write operation must occur for every hit.  If possible, consider completely disabling access logging.  You can outsource web statistics to Google Analytics.  If you require logging, make sure HostnameLookups is disabled (network I/O is even more expensive than disk!) and batch look-ups on another machine or during idle periods with a log analyzer.  As your setups grows (scale-out), log files will become more cumbersome and you will probably be logging to database or a central server anyways.  Varnish, a proxy/http accelerator detailed below has an optimized design for logging.

mod_cache

Apache has an integrated cache module that will keep frequently hit static assets in memory.  For larger sites, forgo this and use a proxy which will be more flexible and allow easier scale-out.

MPMs

Apache makes use of MPMs, or Multi-Processing Modules, for its core functionality.  The default on UNIX is prefork, which makes a separate process for each request.  By switching to a threading MPM such as worker or event, you can cut down overhead and memory use.  Some modules do not play well with threading (PHP), so you should research before changing MPMs.  prefork works well for one and two core servers.

Alternative Web Servers

Lighttpd is the leading alternative FOSS web server.  Users include A-list web sites such as Youtube and wikipedia.  Benchmarks show impressive performance.  Keep in mind Apache is by no means slow nor resource intensive and links on that page show that it is faster on some workloads.

When making comparisons, keep in mind that by design you will probably be using a FastCGI application server and most of the optimizations above will hold true for Lighty.

For sites with long connection times (download servers, AJAX keep-alive) or static content servers, I would definitely lean toward it (scale-out).

Nginx has also been picking up steam (pun intended) and is being used by large sites like Wordpress.com.  I would consider it in the same class as Lighty.

Reverse Proxying

A reverse proxy is very useful for modern web serving.  Even with just one server, a reverse proxy will keep common pages in memory - greatly reducing disk I/O.  They will also keep static requests from using potentially heavy application server HTTPd processes.  These are often very fast at basic HTTP since they are not concerned with all the features of a web server.  When it comes time to scale-out, the proxy can be moved to a separate server.  Proxies can direct traffic to different backend servers.  Proxies can even be placed in geographically disperse areas (think CDNs: Akamai, Limelight - Youtube, Google).  Logging, compression, and SSL can be offloaded to the proxy.  In short, you want a proxy even on a single server (or at least mod_cache).

Varnish

Varnish bills itself as an HTTP accelerator.  It was written from the ground up to perform reverse proxying, and this it does well.  The Varnish design philosophy is enlightened and leaves a lot of the work like memory management to modern advanced operating systems.  Logging is performed in a separate processes and is optimized.  If you need an advanced proxy and accelerator, this is likely the way to go.

Squid

Squid has traditionally been used as the de facto FOSS forward and reverse proxy.  Many large sites such as Wikipedia are extensive users.

Apache and Lighttpd

Both Apache and Lighttpd have modules that will allow them to cache and reverse proxy.  For single server setups, it would probably be worth reusing the components of your web server (think: shared memory) if your application server is external.  mod_proxy is very useful for forwarding ruby requests to a Ruby web server like mongrel or thin.

Application Server

The application server is where most of the magic happens in today’s web 2.0 sites.  Gone are the days of static HTML files.  Most sites are now dynamically generated every visit, and customized per visitor.  This is an order of magnitude more complex, and a lot of CPU time is spent on page generation.  Therefore, tuning here is often one of the best things you can do to improve site scalability.

PHP

PHP is the most widely deployed language on the web.  Many extremely popular applications are written in PHP, including: MediaWiki, Wordpress, Drupal, and phpBB.

Opcode Cache

By default, PHP breaks a script down into opcodes every time it is called.  Opcode translation is necessary to simplify programs so they can easily be parsed by the Zend Engine.  It is unnecessary for this to be done every time a script is called since the source code will rarely change once deployed.  Luckily, a cache can be added that will eliminate this step.  The net performance gain can be a factor of 2 to 10, very impressive for a simple install!

These days, you should chose APC - The Alternative PHP Cache.  Once upon a time, there were several choices here. Turck MMCache was notably fast, beating even the commercial Zend Suite, but mysteriously died out (the original author is now a Zend employee. hmm.. coincidence??).  Others have tried to revive it in the form of eAccelerator, but it isn’t stable nor active.  Any other arguments are moot point since APC will be part of PHP6 core as well as having PHP’s founder as a developer.

Modules

Just as with Apache, removing unused extensions in PHP will help reduce memory usage.  These can be commented out in php.ini.

Rails

Rails has gained a lot of steam (okay I’m wearing that one out) and is a favorite among many Web 2.0 startups including Twitter.

A lot of Rails scalability problems are due to the underlying Ruby language.  The garbage collector, threading and memory allocator have been pinpointed to be particularly bad.  Work is underway to fix these in Ruby 1.9 (bytecode) and 2.0(threading).  In the mean time, consider Ruby Enterprise Edition in tandem with Passenger.  Personally, I’d rather avoid Ruby and all you kool-aid drinkers (but I’ve done a large deployment of Passenger).  Go Python :).

Python

Python is just a plain good language.  With that out of the way, like all the other scripting languages, Python is supposed to be getting a bytecode implementation sooner or later.  Psyco can yield an average 4x performance improvement and is available now.  PyPy should be here sooner rather than later.

Java

Due to the Java language design, code is JIT (Just-In-Time) compiled and you don’t have the compilation problem that the dynamic languages above do.

Java web apps are immensely complex, and aside from the latest JDK (1.6.0.10), your container will play a big role in speed.  Jetty and Tomcat are always good choices.

Databases and Database Caching

A large portion of modern web applications are database driven.  To keep your site running, this point of contention must be addressed.  MySQL is ubiquitous and known for its speed.  PostgreSQL offers some advanced features and is known as the DBA’s FOSS database.  If you need extreme scalability, consider DB2 but prepare to pay dearly :-).

MySQL

MySQL comes configured fairly well out of the box in most distributions.  MySQL Performance Blog sums it up better than I can, so head that way for basic tuning info.

Probably one of the easiest things you can do is enable the integrated query cache.  The good news is your application doesn’t need to do anything to take advantage of this.

in my.cnf:

query_cache_size = 64M

For single server web workloads, this simple change can work miracles and prevent dreaded MySQL connection errors.  This is especially true since web apps are primarily read oriented.  The query cache isn’t perfect in all situations, and in larger sites memcached is more appropriate but has its own disadvantages (see memcached section).

PostgreSQL

PostgreSQL should also be set up fairly well by your distribution.  shared_buffers should probably be tuned, as well as max_connections.  See the PostgreSQL wiki on tuning for a good overview.

There is nothing strictly akin to the MySQL query cache, for better or worse.

Applications and Application Caching

This is potentially the hardest step to implement, yet can also yield the greatest reward.  Caching common database queries, objects, modules, or even writing static HTML versions of a page can cut server load to nothing.  If you are using a common FOSS (free) or COTS (commercial) product, chances are the software already implements some of these options and they may just need to be activated or downloaded as an extension.

Keep in mind not all things are effectively cached, and you may need to perform a major rework to implement aggressive caching like this.

Generic Data Caching - memcached and APC

Many common applications contain backends for caching against memcached or APC.  Mediawiki is a prime example of this, which integrates nicely with memcached or APC.  If you are writing your own apps, using a memory cache can greatly reduce dependency on the database.

memcached

Realizing that databases have a lot of constraints, the folks at LiveJournal.com wrote a generic caching framework called memcached.  Most large sites such as Facebook, Wikipedia, and Slashdot are all using this.

The bad news is you have to port your application to store and check against memcached.  Database queries are a prime target, but just about anything can be stored here.

It is also handy for scale-out because you can add dedicated cache severs.

APC user cache

PHP APC users can manually store information in APC’s shared memory.  This is ideal for single server solutions.  Take a look at this performance comparison vs memcached and files.

Application Caching

Although most pages are dynamically generated these days, a lot are needlessly so.  For example, a content management system might include a header, content, comments and a footer.  This output can be updated and written as a static HTML pages when an author updates them.  Static pages are then served until a user comments on an article, which triggers a cache invalidation and the page is rendered and stored again.  The output of generated menus, columns, and other objects can be stored in cache form as well.

Wordpress Cache Plugins

Wordpress has a couple of plugins that are mandatory for large sites.

WP Super Cache will generate static HTML files of posts on your blog.  They are automatically served via some mod_rewrite magic, and will expire and update automatically.  This can effectively reduce  load to almost nothing - it completely eliminated database access and PHP execution.

WP Widget Cache is a nice addition that will cache output of widgets (sidebar elements such as menus) that don’t commonly change.

Benchmarking

It is important to benchmark your site after making changes to see if it meets performance expectations.  ab is a common tool for this task.

The following will run 10 concurrent requests for 3000 total against localhost:

ab -c10 -n3000 http://localhost/

Be very careful when benchmarking a live site.  You could effectively Denial of Service your server while it is processing all those requests.

What do you think?

I’d be happy to hear your stories from the trenches.  Please share your tuning advice!

Tux3 by Christmas?

Wednesday, December 10th, 2008

Development seem to be going well for Tux3.

Daniel Phillips of Tux3 just posted the following to the LKML:

The big goals for Christmas (this Christmas!) are:

  • SMP locking
  • Atomic commit
  • Posixly complete
  • Rudimentary fsck

With the following comical reference

With atomic commit, we will progress from “buggy Ext2 equivalent with missing features” to “buggy Ext3 equivalent with missing features”.

Not a bad place to arrive at in five months, starting from scratch. Does anybody out there still doubt that the community process works, and is the best way to develop really complex software? Believe it.

See the whole post here: http://lkml.org/lkml/2008/12/10/358