Building Your Own MMDB Database for Fun and Profit

Deprecation Notice

We have deprecated the Perl writer discussed in this article. It is no longer developed or supported. We encourage you to use our Go github.com/maxmind/mmdbwriter module instead. Please see our post on writing MMDB files using the Go programming language .

Introduction

If you use a GeoIP database, you’re probably familiar with MaxMind’s MMDB format .

At MaxMind, we created the MMDB format because we needed a format that was very fast and highly portable. MMDB comes with supported readers in many languages. In this blog post, we’ll create an MMDB file which contains an access list of IP addresses. This kind of database could be used when allowing access to a VPN or a hosted application.

Tools You’ll Need

The code samples I include here use the Perl MMDB database writer and the Perl MMDB database reader . You’ll need to use Perl to write your own MMDB files, but you can read the files with the officially supported .NET, PHP, Java and Python readers in addition to unsupported third party MMDB readers. Many are listed on the GeoIP2 download page . So, as far as deployments go, you’re not constrained to any one language when you want to read from the database.

Following Along

Use our GitHub repository to follow along with the actual scripts. Fire up a pre-configured Vagrant VM or just install the required modules manually.

Getting Started

In our example, we want to create an access list of some IP addresses to allow them access to a VPN or a hosted application. For each IP address or IP range, we need to track a few things about the person who is connecting from this IP.

  • name
  • development environments to which they need access
  • an arbitrary session expiration time, defined in seconds

To do so, we create the following the file examples/01-getting-started.pl

 1#!/usr/bin/env perl
 2
 3use strict;
 4use warnings;
 5use feature qw( say );
 6
 7use MaxMind::DB::Writer::Tree;
 8
 9my $filename = 'users.mmdb';
10
11# Your top level data structure will always be a map (hash).  The MMDB format
12# is strongly typed.  Describe your data types here.
13# See https://metacpan.org/pod/MaxMind::DB::Writer::Tree#DATA-TYPES
14
15my %types = (
16    environments => [ 'array', 'utf8_string' ],
17    expires      => 'uint32',
18    name         => 'utf8_string',
19);
20
21my $tree = MaxMind::DB::Writer::Tree->new(
22
23    # "database_type" is some arbitrary string describing the database.  At
24    # MaxMind we use strings like 'GeoIP2-City', 'GeoIP2-Country', etc.
25    database_type => 'My-IP-Data',
26
27    # "description" is a hashref where the keys are language names and the
28    # values are descriptions of the database in that language.
29    description =>
30        { en => 'My database of IP data', fr => "Mon Data d'IP", },
31
32    # "ip_version" can be either 4 or 6
33    ip_version => 4,
34
35    # add a callback to validate data going in to the database
36    map_key_type_callback => sub { $types{ $_[0] } },
37
38    # "record_size" is the record size in bits.  Either 24, 28 or 32.
39    record_size => 24,
40);
41
42my %address_for_employee = (
43    '123.125.71.29/32' => {
44        environments => [ 'development', 'staging', 'production' ],
45        expires      => 86400,
46        name         => 'Jane',
47    },
48    '8.8.8.8/28' => {
49        environments => [ 'development', 'staging' ],
50        expires      => 3600,
51        name         => 'Klaus',
52    },
53);
54
55for my $network ( keys %address_for_employee ) {
56    $tree->insert_network( $network, $address_for_employee{$network} );
57}
58
59# Write the database to disk.
60open my $fh, '>:raw', $filename;
61$tree->write_tree( $fh );
62close $fh;
63
64say "$filename has now been created";

The Code in Review

Step 1

Create a new MaxMind::DB::Writer::Tree object. The tree is where the database is stored in memory as it is created.

1MaxMind::DB::Writer::Tree->new(...)

The options we’ve used are all commented in the script, but there are additional options. They’re all fully documented as well. To keep things simple (and easily readable), we used IPv4 to store addresses in this example, but you could also use IPv6.

We haven’t used all available types in this script. For example, we also could have used a map to store some of these values. You’re encouraged to review the full list of available types which can be used in map_key_type_callback.

Step 2

For each IP address or range, we call the insert_network() method. This method takes two arguments. The first is a CIDR representation of the network. The second is a hash reference of values which describe the IP range.

1$tree->insert_network( $network, $address_for_employee{$network} );

If you wish to insert an IP address range, use the `insert_range()` method instead:

1$tree->insert_range( $first_ip, $last_ip, $address_for_employee{$network} );

We’ve inserted information about two employees, Jane and Klaus. They’re both on different IP ranges. You’ll see that Jane has access to more environments than Klaus has, but Klaus could theoretically connect from any of 16 different IP addresses (/28) whereas Jane will only connect from one (/32).

We’ve inserted information about two employees, Jane and Klaus. They’re both on different IP ranges. You’ll see that Jane has access to more environments than Klaus has, but Klaus could theoretically connect from any of 16 different IP addresses (/28) whereas Jane will only connect from one (/32).

Step 3

Open a filehandle and the write the database to disk.

1open my $fh, '>:raw', 'my-vpn.mmdb';
2$tree->write_tree( $fh );
3close $fh;

Let’s Do This

Now we’re ready to run the script.

1perl examples/01-getting-started.pl

Your output should look something like:

1users.mmdb has now been created

You should also see the file mentioned above in the folder from which you ran the script.

Reading the File

Now we have our brand new MMDB file. Let’s read the information we stored in it.

 1#!/usr/bin/env perl
 2
 3use strict;
 4use warnings;
 5use feature qw( say );
 6
 7use Data::Printer;
 8use MaxMind::DB::Reader;
 9
10my $ip = shift @ARGV or die 'Usage: perl examples/02-reader.pl [ip_address]';
11
12my $reader = MaxMind::DB::Reader->new( file => 'users.mmdb' );
13
14say 'Description: ' . $reader->metadata->{description}->{en};
15
16my $record = $reader->record_for_address( $ip );
17say np $record;

Reading the File: Review

Step 1

Ensure that the user has provided an IP address via the command line.

1my $ip = shift @ARGV or die 'Usage: perl examples/02-reader.pl [ip_address]';

Step 2

We create a new MaxMind::DB::Reader object, using the name of the file we just created as the sole argument.

1my $reader = MaxMind::DB::Reader->new( file => 'users.mmdb' );

Step 3

Check the metadata. This is optional, but here print the description we added to the metadata in the previous script.

1say 'Description: ' . $reader->metadata->{description}->{en};

Much more metadata is available in addition to the description. $reader->metadata returns a MaxMind::DB::Metadata object which provides much more information about the file you created.

Step 4

We perform a record lookup and dump it using Data::Printer’s handy np() method.

1my $record_for_jane = $reader->record_for_address( '123.125.71.29' );
2say np $record_for_jane;

Running the Script

Now let’s run the script and perform a lookup on Jane’s IP address:

1perl examples/02-reader.pl 123.125.71.29

Your output should look something like this:

 1vagrant@precise64:/vagrant$ perl examples/02-reader.pl 123.125.71.29
 2Description: My database of IP data
 3\ {
 4    environments   [
 5        [0] "development",
 6        [1] "staging",
 7        [2] "production"
 8    ],
 9    expires        86400,
10    name           "Jane"
11}

We see that our description and our Hash of user data is returned exactly as we initially provided it. But what about Klaus, is he also in the database?

 1vagrant@precise64:/vagrant$ perl examples/02-reader.pl 8.8.8.0
 2Description: My database of IP data
 3\ {
 4    environments   [
 5        [0] "development",
 6        [1] "staging"
 7    ],
 8    expires        3600,
 9    name           "Klaus"
10}
11vagrant@precise64:/vagrant$ perl examples/02-reader.pl 8.8.8.15
12Description: My database of IP data
13\ {
14    environments   [
15        [0] "development",
16        [1] "staging"
17    ],
18    expires        3600,
19    name           "Klaus"
20}
21vagrant@precise64:/vagrant$ perl examples/02-reader.pl 8.8.8.16
22Description: My database of IP data
23undef

We gave Klaus an IP range of 8.8.8.8/28, which translates to 8.8.8.0 to 8.8.8.15. You can see that when we get to 8.8.8.16 we get an undef response, because there is no record at this address.

Iterating Over the Search Tree

It takes time to look up every address individually. Is there a way to speed things up? As it happens, there is.

 1#!/usr/bin/env perl
 2
 3use strict;
 4use warnings;
 5use feature qw( say );
 6
 7use Data::Printer;
 8use MaxMind::DB::Reader;
 9use Net::Works::Address;
10
11my $reader = MaxMind::DB::Reader->new( file => 'users.mmdb' );
12
13$reader->iterate_search_tree(
14    sub {
15        my $ip_as_integer = shift;
16        my $mask_length   = shift;
17        my $data          = shift;
18
19        my $address = Net::Works::Address->new_from_integer(
20            integer => $ip_as_integer );
21        say join '/', $address->as_ipv4_string, $mask_length;
22        say np $data;
23    }
24);

Iterating: Review

Step 1

As in the previous example, we create a new MaxMind::DB::Reader object.

Step 2

To dump our data, we pass an anonymous subroutine to the iterate_search_tree() method . (This method can actually take two callbacks, but the second callback is for debugging the actual nodes in the tree – that’s too low level for our purposes today.)

We’ve appropriately named the three arguments which are passed to the callback, so there’s not much more to say about them. Let’s look at the output.

 1vagrant@precise64:/vagrant$ perl examples/03-iterate-search-tree.pl
 28.8.8.0/28
 3\ {
 4    environments   [
 5        [0] "development",
 6        [1] "staging"
 7    ],
 8    expires        3600,
 9    name           "Klaus"
10}
11123.125.71.29/32
12\ {
13    environments   [
14        [0] "development",
15        [1] "staging",
16        [2] "production"
17    ],
18    expires        86400,
19    name           "Jane"
20}

The output shows the first IP in each range (note that Jane’s IP is just a “range” of one) and then displays the user data with which we’re now familiar.

The Mashup

To extend our example, let’s take the data from an existing GeoIP2 database and combine it with our custom MMDB file.

If you’re using the Vagrant VM, you have a copy of GeoLite2-City.mmdb in /usr/share/GeoIP. If not, you may need to download this file or use geoipupdate . For more details on how to set this up, you can look at the provision section of the Vagrantfile in the GitHub repository.

You can take any number of fields from existing MaxMind databases to create your own custom database. In this case, let’s extend our existing database by adding city, country and time_zone fields for each IP range. We can use this information to (possibly) customize the user’s environment. We can use the time zone when displaying dates or times. We can limit access to certain features based on the country in which the user is currently located.

 1#!/usr/bin/env perl
 2
 3use strict;
 4use warnings;
 5use feature qw( say );
 6
 7use GeoIP2::Database::Reader;
 8use MaxMind::DB::Writer::Tree;
 9use Net::Works::Network;
10
11my $filename = 'users.mmdb';
12my $reader   = GeoIP2::Database::Reader->new(
13    file    => '/usr/share/GeoIP/GeoLite2-City.mmdb',
14    locales => ['en'],
15);
16
17# Your top level data structure will always be a map (hash).  The MMDB format
18# is strongly typed.  Describe your data types here.
19# See https://metacpan.org/pod/MaxMind::DB::Writer::Tree#DATA-TYPES
20
21my %types = (
22    city         => 'utf8_string',
23    country      => 'utf8_string',
24    environments => [ 'array', 'utf8_string' ],
25    expires      => 'uint32',
26    name         => 'utf8_string',
27    time_zone    => 'utf8_string',
28);
29
30my $tree = MaxMind::DB::Writer::Tree->new(
31
32    # "database_type" is an arbitrary string describing the database.  At
33    # MaxMind we use strings like 'GeoIP2-City', 'GeoIP2-Country', etc.
34    database_type => 'My-IP-Data',
35
36    # "description" is a hashref where the keys are language names and the
37    # values are descriptions of the database in that language.
38    description =>
39        { en => 'My database of IP data', fr => "Mon Data d'IP", },
40
41    # "ip_version" can be either 4 or 6
42    ip_version => 4,
43
44    # add a callback to validate data going in to the database
45    map_key_type_callback => sub { $types{ $_[0] } },
46
47    # let the writer handle merges of IP ranges. if we don't set this then the
48    # default behaviour is for the last network to clobber any overlapping
49    # ranges.
50    merge_record_collisions => 1,
51
52    # "record_size" is the record size in bits.  Either 24, 28 or 32.
53    record_size => 24,
54);
55
56my %address_for_employee = (
57    '123.125.71.29/32' => {
58        environments => [ 'development', 'staging', 'production' ],
59        expires      => 86400,
60        name         => 'Jane',
61    },
62    '8.8.8.8/28' => {
63        environments => [ 'development', 'staging' ],
64        expires      => 3600,
65        name         => 'Klaus',
66    },
67);
68
69for my $range ( keys %address_for_employee ) {
70
71    my $user_metadata = $address_for_employee{$range};
72
73    # Iterate over network and insert IPs individually
74    my $network = Net::Works::Network->new_from_string( string => $range );
75    my $iterator = $network->iterator;
76
77    while ( my $address = $iterator->() ) {
78        my $ip = $address->as_ipv4_string;
79        my $model = $reader->city( ip => $ip );
80
81        if ( $model->city->name ) {
82            $user_metadata->{city} = $model->city->name;
83        }
84        if ( $model->country->name ) {
85            $user_metadata->{country} = $model->country->name;
86        }
87        if ( $model->location->time_zone ) {
88            $user_metadata->{time_zone} = $model->location->time_zone;
89        }
90        $tree->insert_network( $network, $user_metadata );
91    }
92}
93
94# Write the database to disk.
95open my $fh, '>:raw', $filename;
96$tree->write_tree( $fh );
97close $fh;
98
99say "$filename has now been created";

Now, when we iterate over the search tree, we’ll see that the data has been augmented with the new fields.

 1vagrant@precise64:/vagrant$ perl examples/03-iterate-search-tree.pl
 28.8.8.0/28
 3\ {
 4    city           "Mountain View",
 5    country        "United States",
 6    environments   [
 7        [0] "development",
 8        [1] "staging"
 9    ],
10    expires        3600,
11    name           "Klaus",
12    time_zone      "America/Los_Angeles"
13}
14123.125.71.29/32
15\ {
16    city           "Beijing",
17    country        "China",
18    environments   [
19        [0] "development",
20        [1] "staging",
21        [2] "production"
22    ],
23    expires        86400,
24    name           "Jane",
25    time_zone      "Asia/Shanghai"
26}

Adding GeoLite2-City Data: Review

To extend our example we make two additions to our original file:

Step 1

We create a new reader object:

1my $reader   = GeoIP2::Database::Reader->new(
2    file    => '/usr/share/GeoIP/GeoLite2-City.mmdb',
3    locales => ['en'],
4);

Note that this file may be in a different location if you’re not using Vagrant. Adjust accordingly.

Step 2

Now, we take our existing data so that we can augment it with GeoIP2 data.

 1    my $user_metadata = $address_for_employee{$range};
 2
 3    # Iterate over network and insert IPs individually
 4    my $network = Net::Works::Network->new_from_string( string => $range );
 5    my $iterator = $network->iterator;
 6
 7    while ( my $address = $iterator->() ) {
 8        my $ip = $address->as_ipv4_string;
 9        my $model = $reader->city( ip => $ip );
10
11        if ( $model->city->name ) {
12            $user_metadata->{city} = $model->city->name;
13        }
14        if ( $model->country->name ) {
15            $user_metadata->{country} = $model->country->name;
16        }
17        if ( $model->location->time_zone ) {
18            $user_metadata->{time_zone} = $model->location->time_zone;
19        }
20        $tree->insert_network( $network, $user_metadata );
21    }

As in our first example, we’re create a new Net::Works::Network object. However, in this case we are going to insert each individual IP in the range. The reason for this is that we don’t know if our IP ranges match the ranges in the GeoLite2 database. If we just rely on using the reader data for some arbitrary IP in the range, we can’t be 100% sure that this is representative of all other IPs in the range. If we insert each IP in the range, we don’t need to rely on the assumption that the data for a random IP will be consistent across our ranges.

In order for this to work, we set merge_record_collisions => 1 when we created the MaxMind::DB::Writer::Tree object. This allows the writer to be smart about merging ranges rather than letting a new range clobber any overlapping addresses.

Note that this approach is fine for a small database, but it likely will not scale well in terms of speed when writing a database with a large number of records. If you’re looking to create a very large database and writing speed is an issue, you are encouraged to look into using the MaxMind CSVs to seed your database.

Iterating over a network is trivial.

1    my $network = Net::Works::Network->new_from_string( string => $range );
2    my $iterator = $network->iterator;
3
4    while ( my $address = $iterator->() ) {
5        my $ip = $address->as_ipv4_string;
6        ...
7    }

The next step is to look up an IP address using the reader.

1my $model = $reader->city( ip => $ip );

We need to pass the model a string rather than an object, so we call the as_ipv4_string() method.

Next we add new keys to Hash. The new keys are country, city and time_zone. Note that we only add them if they exist. If we try to add an undefined value to the Hash, it an exception will be thrown.

Now, let’s see what we get.

 1vagrant@precise64:/vagrant$ perl examples/03-iterate-search-tree.pl
 28.8.8.0/28
 3\ {
 4    city           "Mountain View",
 5    country        "United States",
 6    environments   [
 7        [0] "development",
 8        [1] "staging"
 9    ],
10    expires        3600,
11    name           "Klaus",
12    time_zone      "America/Los_Angeles"
13}
14123.125.71.29/32
15\ {
16    city           "Beijing",
17    country        "China",
18    environments   [
19        [0] "development",
20        [1] "staging",
21        [2] "production"
22    ],
23    expires        86400,
24    name           "Jane",
25    time_zone      "Asia/Shanghai"
26}

Even though we inserted Klaus’s addresses individually, we can see that the writer did the right thing and merged the addresses into an appropriately sized network.

Deploying Our Application

Now we’re at the point where we can make use of our database. With just a few lines of code you can now use your MMDB file to assist in the authorization of your application or VPN users. For example, you might include the following lines in a class which implements your authentication.

 1use MaxMind::DB::Reader;
 2
 3my $reader = MaxMind::DB::Reader->new( file => '/path/to/users.mmdb' );
 4
 5sub is_ip_valid {
 6    my $self   = shift;
 7    my $ip     = shift;
 8
 9    my $record = $reader->record_for_address( $ip );
10    return 0 unless $record;
11
12    $self->set_session_expiration( $record->{expires} );
13    $self->set_time_zone( $record->{time_zone} ) if $record->{time_zone};
14    return 1;
15}

Here’s a quick summary of what’s going on:

  • As part of your deployment you’ll naturally need to include your users.mmdb file, stored in the location of your choice.
  • You’ll need to create a MaxMind::DB::Reader object to perform the lookup.
  • If the $record is undef, the IP could not be found.
  • If the IP is found, you can set a session expiration.
  • If the IP is found, you can also set a time zone for the user. Keep in mind that it’s possible that the time_zone key does not exist, so it’s important that you don’t assume it will always be available.

Pro Tips

Including the Contents of an Entire MaxMind DB

To include the contents of an entire GeoIP2 database rather than selected data points, you have a couple of options for iterating over a database in Perl.

MaxMind::DB::Reader

A very simple way to get started is to iterate over the search tree using MaxMind::DB::Reader as we did in examples/03-iterate-search-tree.pl. However, note that iterating over the entire tree using the Perl reader can be quite slow.

Parsing a CSV

This requires slightly more logic, but reading a CSV file line by line will give you a significant speed boost over search tree iteration.

Free downloads of CSV files for GeoLite2 City and GeoLite2 Country are available from MaxMind.com . If you’re using the Vagrant VM, you’ll find GeoLite2-City-Blocks-IPv4.csv and GeoLite2-City-Locations-en.csv already in your /vagrant directory. examples/06-read-csv.pl will give you a head start on parsing these CSVs.

Insert Order, Merging and Overwriting

It’s important to understand MaxMind::DB::Writer’s configurable behaviour for inserting ranges. Please see our documentation on Insert Order, Merging and Overwriting so that you can choose the correct behaviour for any overlapping IP ranges you may come across when writing your own database files.

Taking This Further

Today we’ve shown how you can create your own MMDB database and augment it with data from a GeoLite2-City database. We’ve only included a few data points, but MaxMind databases contain much more data you can use to build a solution to meet your business requirements.

About our contributor: Olaf Alders is a Senior Software Engineer at MaxMind. After taking his first course in Fortran, Olaf earned an M.A. in Classical Philology from McMaster University and an M.A. in Medieval Studies from the University of Toronto. His open source projects include MetaCPAN.org , as well as various Perl modules . Follow him on Twitter @olafalders .