Mirroring Fedora

Introduction

This post details setting up your own private mirror of Fedora’s repos.  There are many ways to do this, but this method is by far the best for heavy usage.  By using MirrorManager, clients in your IP range need no custom configuration.  Roaming laptop users automagically hit your mirror while on the premises, yet use the public infrastructure elsewhere.  Setup isn’t exactly hard, but it isn’t well documented so I’ll write about my experience here.

Some background info.. we have at least 50 Linux desktops, laptops, servers and VMs running about half Fedora 10 and half Fedora 11 at work.  Due to the number of systems, breadth of packages used, and desire to quickly update when new releases are out, I decided on a full mirror setup.  If you only have a handful of systems, you may be better off simply using a general purpose caching proxy like Squid, perhaps telling MirrorManager to point to it.

This guide should be used in addition to http://fedoraproject.org/wiki/Infrastructure/Mirroring which has some background info.

Initial setup and mirror

First, get prepared by installing MirrorManager-client, which contains the report_mirror script you will need.  If your mirror isn’t running Fedora, you can clone the source of this app from their GIT repo.

yum install mirrormanager-client

You’ll be using rsync, a sysadmin’s best friend, for efficient mirroring.

Set up a shell script like mine below (d0mirror.sh) one level up from where your mirror will be accessible (http, ftp, rsync, nfs – covered later).  This one mirrors against kernel.org.  Choose a mirror close to you on the Internet.

rsync -vaH --exclude-from=fedora-excludes.txt --numeric-ids --delete --delete-delay \
 --delay-updates rsync://mirrors.kernel.org/fedora-enchilada fedora-mirror
report_mirror

And a text file (fedora-excludes.txt) excluding things you don’t want/need.  Take a look through a public mirror and decide if you want to eliminate anything else.  You may want to remove the *.iso line below if you want users to be able to pull disc images from this box.  Otherwise, this is probably a good list for most people.  You can exclude all of linux/updates/testing/ if you don’t enable the testing repo on any of your machines.

**/debug/**
**/alpha/**
**/source/**
**/SRPMS/**
**/*.iso
**/ppc/**
**/ppc64/**
linux/core/**
linux/development/**
linux/releases/7/**
linux/releases/8/**
linux/releases/9/**
linux/releases/test/**
linux/updates/8/**
linux/updates/9/**
linux/updates/testing/7/**
linux/updates/testing/8/**
linux/updates/testing/9/**

Run your shell script and sit back for up to a day or two depending on your connection speed.  My current mirror weighs in at about 80G.

Internal distribution

While you wait for sync, decide how you want to run the service internally.  HTTP is nice because it is easy for users to browse and decently quick with keep-alive.   Using NFS, rsync, or FTP may be a bit more efficient if you are worried about this.  You can list several URLs in MirrorManager for the best of all worlds.

Add the following to your Apache configuration if you decide to use HTTP:

Alias /fedora/ "/mnt/ar1/fedora-mirror/"

AddType application/octet-stream .rpm

<Directory "/mnt/ar1/fedora-mirror">
    Options Indexes FollowSymLinks
    Order allow,deny
    Allow from all
</Directory>

<LocationMatch "\.(xml|xml\.gz|xml\.asc|sqlite)">
    Header set Cache-Control "must-revalidate"
    ExpiresActive On
    ExpiresDefault "now"
</LocationMatch>

Set up any other services of you choice to push that directory out in addition.

Working with MirrorManager client and server

Next, open up /etc/mirrormanager-client/report_mirror.conf.  Take notice of the site name, password, and host name.  You will need to set these up in MirrorManager in a bit.  The paths here are all local and used by report_mirror to check what you have available.

# if enabled=0, no data is sent to the database
enabled=1
server=https://admin.fedoraproject.org/mirrormanager/xmlrpc

[site]
# if enabled=0, no data about this site is sent to the database
enabled=1
name=<yoursitename>
password=<yourhostpassword>

[host]
# if enabled=0, no data about this host is sent to the database
enabled=1
name=x345-a2.internal
# if user_active=0, no data about this category is given to the public
# This can be used to toggle between serving and not serving data,
# such enabled during the nighttime (when you have more idle bandwidth
# available) and disabled during the daytime.
# not passing it means leave it alone in the database.

[stats]
# Stats are only sent when run with the -s option
# and when this section is enabled.
enabled=0
apache=/var/log/httpd/access_log
vsftpd=/var/log/vsftpd.log
# remember to enable log file and transfer logging in rsyncd.conf
rsyncd=/var/log/rsyncd.log

[Fedora Linux]
enabled=1
path=/mnt/ar1/fedora-mirror/linux

[Fedora EPEL]
path=/var/www/html/pub/epel
enabled=0

# lesser used categories below

[Fedora Web]
enabled=0
path=/var/www/html/pub/fedora/web

[Fedora Secondary Arches]
enabled=0
path=/var/www/html/pub/fedora-secondary

[Fedora Other]
enabled=0
path=/var/www/html/pub/alt

# historical content

[Fedora Core]
# if enabled=0, no data about this host is sent to the database
enabled=0
path=/var/www/html/pub/fedora/linux/core

[Fedora Extras]
enabled=0
path=/var/www/html/pub/fedora/linux/extras

Log into https://admin.fedoraproject.org/mirrormanager, creating a new account if you need to.  Add a new site with the same name as the config file from above.  You’ll set the site password here, and make sure to check the ‘private’ box if this is only for internal users.  Now, add a host under this site.  The name here should probably be a FQDN of your actual mirror, even if it is internal only (i.e x345-a2.internal from my example above).  Once that is done, add a “site-local netblock”.  This is your public IP network/netmask or network in CIDR notation.  If you only have one public IP, it will be in the format nnn.nnn.nnn.nnn/32.

Almost done.  Now, click Add Category.  “Fedora Linux” is the only one you are concerned with if you followed all the values in this guide so far.  Add the others if needed.  Tell them your upstream source (rsync://mirrors.kernel.org/fedora-enchilada from above) and then your internal URL (http://x345-a2.internal/fedora/linux for my setup).

Conclusion

Once your rsync is complete and report_mirror is done, you should see clients start hitting your box.   Don’t forget to add your mirror script (domirror.sh from above — rsync and report_mirror) to cron!  You may wish to join the private ‘fedora-mirrors’ mail lists to be informed of new releases and changes.

The best thing is that it works across all package requests, including new machines, roaming users,  ‘preupgrade’, etc.   All in all, pretty nifty!  Your users will love you when their upgrades are almost instant!  The Fedora infrastructure is set up very well for mirroring, public and private, and this is how the project copes with the huge demand for new releases.  Comment away if you need clarification or help.

Kernel 2.6.30 is a Go

I initially thought this would be a rather uninteresting release, especially when we learned Xen dom0 didn’t make the cut. Following the changelog line-by-line, this one still didn’t seem very interesting to me. But analyzing the sum of parts, I have to consider 2.6.30 a ‘golden’ kernel — certainly the best in a while.

There is solid improvement top to bottom here.  A lot of the new KMS/DRM stuff from Fedora 11 has worked its way up stream.  File system work is too much to mention, but highlights include relatime, writeback by default for Extfs, NILFS2, Btrfs development and more. FSCache works as advertised.  Also some groundwork for NFS 4.1, which will eventually bring us pNFS.

Boot speed seems fast as ever, but I haven’t taken the time to do any empirical analysis.  Your results here will be hardware dependent but async initialization of certain subsystems is a welcome move in the right direction.

Basically, a solid release with a good balance of new stuff but mainly refinement of existing systems and merging of longstanding patches.

Kernel Newbies has, as usual, a great change summary: http://kernelnewbies.org/Linux_2_6_30

Kernel developers don’t get Xen

The recent bruhaha surrounding Xen on LKML (http://lkml.org/lkml/2009/6/2/475) is really disheartening.  Essentially, the Linux kernel devs are at a disconnect with users.  Some are proposing narrow-minded ideas such as DROPPING software paravirt or merging Xen as a whole into the kernel.

I use Xen for a few primary reasons:  it bar none has the best speed — full software paravirtualization pays dividends here;  it is mature;  it works on perfectly good machines that don’t happen to have the latest chips;  it does hardware passthrough on these same systems;  it has great live migration that actually works.

Ingo Molnar wants you to send all your perfectly good enterprise iron to the landfill even though these systems will last 10+ useful years without boneheaded software decisions such as this.

These same FUDsters want to strip the crossplatform nature of Xen dom0 out too.  Xen dom0 runs on NetBSD and Solaris.  It is a true hypervisor and will plug into exisiting architectures, and not force you to use Linux for everything.

I have to admire all the hoops Jeremy Fitzhardinge has jumped through to date, as I know my patience is wearing thin.

Xen powers huge sites such as Amazon and services like linode.com/slicehost.com.  By not having dom0 in the kernel where distros such as Ubuntu and Fedora can easily integrate it, kernel devs are doing a disservice to users.

I use KVM, VMWare, and Virtual Box at work in addition, but Xen is firmly entrenched in my toolbox.  The roadmap they have looks great, and I just don’t see a reason for decline in Xen popularity.  High availability in Xen 4.0 is what I’ve always been waiting for.

Jeremy has gone to great lengths to work with upstream but keeps getting shot down and asked to do something else when he meets one requirement.  The solution is to merge Jeremy’s conservative dom0 patch set and work on a technical solution to the patches that the FUDsters consider bad.  It’s what the users want!