Creating a local mirror of Ubuntu’s most popular packages


The problem

You want to create local mirrors of the apt repositories that you use but you don’t have enough hard drive space to mirror every package. Or maybe you have a slow link and you don’t want to spend time downloading packages that you’re unlikely to need.

The solution

Only mirror packages whose popularity (as reported by popcon’s “installed” metric) matches a certain threshold.

The explanation

I’ve been hacking without a network connection recently and one of the biggest pain points is not having access to my distro’s software package repository.

For example, while writing some Python screen-scraping code last week I realized I didn’t have the Python library I wanted to parse some HTML with – Beautiful Soup. Rather than postpone my work on the script until I found a weefee signal, it would have been nice to simply install the package from a local mirror of the repository.

I soon discovered two common tools that can be used to create a local mirror of a repository – Frans Pop’s debmirror and Dmitriy Khramtsov’s apt-mirror.

I chose apt-mirror, skimmed Alan Pope’s handy step-by-step guide and kicked off the mirror script…

$ sudo -u apt-mirror apt-mirror

[...]
52.7 GiB will be downloaded into archive.
Downloading 75 archive files using 10 threads...

ACK! That’s a lot of gibibytes.

Eventually I’d like a complete mirror, but for now, I only want the packages I’m likely to need. My broadband connection isn’t as “broad” as I would like.

The Debian Popularity Contest (“popcon”) came to mind and sure enough, Ubuntu also provides a flat text file containing the names of all packages sorted by the frequency with which they’re installed by users.

I downloaded this file and hacked up the primary apt-mirror perl script to consult the file, only mirroring binary and source packages if they meet a chosen popularity threshold.

Here’s the meat from a patch that applies cleanly to apt-mirror version 0.4.5-1ubuntu2:


sub should_process {
# print "should_process()\n";
my $pkg_name = shift;
my $section_name = shift;
my @popular_pkgs = @{ $_[0] };

# if the pkg isn't in the 'game' section...
if($section_name !~ /game/){
my %is_popular;
for (@popular_pkgs) { $is_popular{$_} = 1 };

if( $is_popular{$pkg_name} ) {
# print "processing popular pkg: " . $pkg_name . "\n";
return 1;
} else {
# print "skipping unpopular pkg: " . $pkg_name . "\n";
return 0;
}
} else {
# print "skipping game pkg: " . $pkg_name . "\n";
return 0;
}
}

# [...]

# open our popcon database
my $db_path = "/home/tz/Desktop/by_inst";
open(FILE,$db_path) or die "Can't open popcon db: $!";
my @data=; # beware record separator ($/) tweak below
close FILE;
my $num_comment_lines = 11;
my $threshold = 3000;
my $cur_line;
my @popular_pkgs;
# for each of the first $threshold lines, grab pkg name
foreach $cur_line (@data[$num_comment_lines .. ($num_comment_lines + $threshold)]) {
# print "cur_line: $cur_line";
my @tokens = split / +/, $cur_line;
# print "pkgname: " . $tokens[1] . "\n";
push( @popular_pkgs, $tokens[1] );
}

# [...]

if( should_process( $lines{"Package:"}, $lines{"Section:"},@popular_pkgs ) ) {
add_url_to_download($uri . "/" . $lines{"Directory:"} . "/" . $file[2], $file[1]);
}

Tweak the path to the flat file (`$db_path`) and the threshold (`$threshold`!) to suit your needs.

 ________________________________________
/ As you can see, I also modified the    \
| script to skip games. Games tend to be |
| large and there aren't many that I use |
\ often, except perhaps cowsay(1) :]     /
 ----------------------------------------
    \               ,-----._
  .  \         .  ,'        `-.__,------._
 //   \      __\\'                        `-.
((    _____-'___))                           |
 `:='/     (alf_/                            |
 `.=|      |='                               |
    |)   O |                                  \
    |      |                               /\  \
    |     /                          .    /  \  \
    |    .-..__            ___   .--' \  |\   \  |
   |o o  |     ``--.___.  /   `-'      \  \\   \ |
    `--''        '  .' / /             |  | |   | \
                 |  | / /              |  | |   mmm
                 |  ||  |              | /| |
                 ( .' \ \              || | |
                 | |   \ \            // / /
                 | |    \ \          || |_|
                /  |    |_/         /_|
               /__/

Future improvements

  • Download popcon db file, rather than expect that it already exists on disk.
  • Read desired popularity threshold from mirror.list rather than using a hard-coded value.
  • Read desired sections as above.
  • Speed holes! My perl-fu is weak.

Tips

  • I found a faster mirror half-way through creating my local mirror. Renaming `/var/spool/apt-mirror/{mirror,skel}/${OLD_MIRROR` to `/var/spool/apt-mirror/{mirror,skel}/${NEW_MIRROR}` was sufficient.
  • If you try to install a package from your local mirror which doesn’t exist, you’ll get a 404 error – nothing catastrophic happens.
  • Beware permissions issues. Avoid running apt-mirror as root rather than the prescribed `apt-mirror` user.
  • debmirror has a `–exclude-deb-section` option

Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

Woot, tyler in Cyber Space.

Nice post.

Oh, and:

First!

King vitaman returns from Outer Space! (whatsupbill)