Building Your Own MMDB Database for Fun and Profit
Deprecation Notice
We have deprecated the Perl writer discussed in this article. It is no longer
developed or supported. We encourage you to use our
Go github.com/maxmind/mmdbwriter
module
instead. Please see our post on
writing MMDB files using the Go programming language
.
Introduction
If you use a GeoIP database, you’re probably familiar with MaxMind’s MMDB format .
At MaxMind, we created the MMDB format because we needed a format that was very fast and highly portable. MMDB comes with supported readers in many languages. In this blog post, we’ll create an MMDB file which contains an access list of IP addresses. This kind of database could be used when allowing access to a VPN or a hosted application.
Tools You’ll Need
The code samples I include here use the Perl MMDB database writer and the Perl MMDB database reader . You’ll need to use Perl to write your own MMDB files, but you can read the files with the officially supported .NET, PHP, Java and Python readers in addition to unsupported third party MMDB readers. Many are listed on the GeoIP2 download page . So, as far as deployments go, you’re not constrained to any one language when you want to read from the database.
Following Along
Use our GitHub repository to follow along with the actual scripts. Fire up a pre-configured Vagrant VM or just install the required modules manually.
Getting Started
In our example, we want to create an access list of some IP addresses to allow them access to a VPN or a hosted application. For each IP address or IP range, we need to track a few things about the person who is connecting from this IP.
- name
- development environments to which they need access
- an arbitrary session expiration time, defined in seconds
To do so, we create the following the file examples/01-getting-started.pl
1#!/usr/bin/env perl
2
3use strict;
4use warnings;
5use feature qw( say );
6
7use MaxMind::DB::Writer::Tree;
8
9my $filename = 'users.mmdb';
10
11# Your top level data structure will always be a map (hash). The MMDB format
12# is strongly typed. Describe your data types here.
13# See https://metacpan.org/pod/MaxMind::DB::Writer::Tree#DATA-TYPES
14
15my %types = (
16 environments => [ 'array', 'utf8_string' ],
17 expires => 'uint32',
18 name => 'utf8_string',
19);
20
21my $tree = MaxMind::DB::Writer::Tree->new(
22
23 # "database_type" is some arbitrary string describing the database. At
24 # MaxMind we use strings like 'GeoIP2-City', 'GeoIP2-Country', etc.
25 database_type => 'My-IP-Data',
26
27 # "description" is a hashref where the keys are language names and the
28 # values are descriptions of the database in that language.
29 description =>
30 { en => 'My database of IP data', fr => "Mon Data d'IP", },
31
32 # "ip_version" can be either 4 or 6
33 ip_version => 4,
34
35 # add a callback to validate data going in to the database
36 map_key_type_callback => sub { $types{ $_[0] } },
37
38 # "record_size" is the record size in bits. Either 24, 28 or 32.
39 record_size => 24,
40);
41
42my %address_for_employee = (
43 '123.125.71.29/32' => {
44 environments => [ 'development', 'staging', 'production' ],
45 expires => 86400,
46 name => 'Jane',
47 },
48 '8.8.8.8/28' => {
49 environments => [ 'development', 'staging' ],
50 expires => 3600,
51 name => 'Klaus',
52 },
53);
54
55for my $network ( keys %address_for_employee ) {
56 $tree->insert_network( $network, $address_for_employee{$network} );
57}
58
59# Write the database to disk.
60open my $fh, '>:raw', $filename;
61$tree->write_tree( $fh );
62close $fh;
63
64say "$filename has now been created";
The Code in Review
Step 1
Create a new MaxMind::DB::Writer::Tree object. The tree is where the database is stored in memory as it is created.
1MaxMind::DB::Writer::Tree->new(...)
The options we’ve used are all commented in the script, but there are additional options. They’re all fully documented as well. To keep things simple (and easily readable), we used IPv4 to store addresses in this example, but you could also use IPv6.
We haven’t used all available types in this script. For example, we also could
have used a map
to store some of these values. You’re encouraged to review
the full list of available types
which can be used in map_key_type_callback
.
Step 2
For each IP address or range, we call the insert_network()
method. This method
takes two arguments. The first is a CIDR representation of the network. The
second is a hash reference of values which describe the IP range.
1$tree->insert_network( $network, $address_for_employee{$network} );
If you wish to insert an IP address range, use the `insert_range()` method instead:
1$tree->insert_range( $first_ip, $last_ip, $address_for_employee{$network} );
We’ve inserted information about two employees, Jane and Klaus. They’re both on different IP ranges. You’ll see that Jane has access to more environments than Klaus has, but Klaus could theoretically connect from any of 16 different IP addresses (/28) whereas Jane will only connect from one (/32).
We’ve inserted information about two employees, Jane and Klaus. They’re both on different IP ranges. You’ll see that Jane has access to more environments than Klaus has, but Klaus could theoretically connect from any of 16 different IP addresses (/28) whereas Jane will only connect from one (/32).
Step 3
Open a filehandle and the write the database to disk.
1open my $fh, '>:raw', 'my-vpn.mmdb';
2$tree->write_tree( $fh );
3close $fh;
Let’s Do This
Now we’re ready to run the script.
1perl examples/01-getting-started.pl
Your output should look something like:
1users.mmdb has now been created
You should also see the file mentioned above in the folder from which you ran the script.
Reading the File
Now we have our brand new MMDB file. Let’s read the information we stored in it.
1#!/usr/bin/env perl
2
3use strict;
4use warnings;
5use feature qw( say );
6
7use Data::Printer;
8use MaxMind::DB::Reader;
9
10my $ip = shift @ARGV or die 'Usage: perl examples/02-reader.pl [ip_address]';
11
12my $reader = MaxMind::DB::Reader->new( file => 'users.mmdb' );
13
14say 'Description: ' . $reader->metadata->{description}->{en};
15
16my $record = $reader->record_for_address( $ip );
17say np $record;
Reading the File: Review
Step 1
Ensure that the user has provided an IP address via the command line.
1my $ip = shift @ARGV or die 'Usage: perl examples/02-reader.pl [ip_address]';
Step 2
We create a new MaxMind::DB::Reader object, using the name of the file we just created as the sole argument.
1my $reader = MaxMind::DB::Reader->new( file => 'users.mmdb' );
Step 3
Check the metadata. This is optional, but here print the description we added to the metadata in the previous script.
1say 'Description: ' . $reader->metadata->{description}->{en};
Much more metadata is available in addition to the description
.
$reader->metadata
returns a
MaxMind::DB::Metadata
object
which provides much more information about the file you created.
Step 4
We perform a record lookup and dump it using Data::Printer’s handy np()
method.
1my $record_for_jane = $reader->record_for_address( '123.125.71.29' );
2say np $record_for_jane;
Running the Script
Now let’s run the script and perform a lookup on Jane’s IP address:
1perl examples/02-reader.pl 123.125.71.29
Your output should look something like this:
1vagrant@precise64:/vagrant$ perl examples/02-reader.pl 123.125.71.29
2Description: My database of IP data
3\ {
4 environments [
5 [0] "development",
6 [1] "staging",
7 [2] "production"
8 ],
9 expires 86400,
10 name "Jane"
11}
We see that our description
and our Hash
of user data is returned exactly as
we initially provided it. But what about Klaus, is he also in the database?
1vagrant@precise64:/vagrant$ perl examples/02-reader.pl 8.8.8.0
2Description: My database of IP data
3\ {
4 environments [
5 [0] "development",
6 [1] "staging"
7 ],
8 expires 3600,
9 name "Klaus"
10}
11vagrant@precise64:/vagrant$ perl examples/02-reader.pl 8.8.8.15
12Description: My database of IP data
13\ {
14 environments [
15 [0] "development",
16 [1] "staging"
17 ],
18 expires 3600,
19 name "Klaus"
20}
21vagrant@precise64:/vagrant$ perl examples/02-reader.pl 8.8.8.16
22Description: My database of IP data
23undef
We gave Klaus an IP range of 8.8.8.8/28
, which translates to
8.8.8.0 to 8.8.8.15
. You can see that when we get to 8.8.8.16
we get an
undef
response, because there is no record at this address.
Iterating Over the Search Tree
It takes time to look up every address individually. Is there a way to speed things up? As it happens, there is.
1#!/usr/bin/env perl
2
3use strict;
4use warnings;
5use feature qw( say );
6
7use Data::Printer;
8use MaxMind::DB::Reader;
9use Net::Works::Address;
10
11my $reader = MaxMind::DB::Reader->new( file => 'users.mmdb' );
12
13$reader->iterate_search_tree(
14 sub {
15 my $ip_as_integer = shift;
16 my $mask_length = shift;
17 my $data = shift;
18
19 my $address = Net::Works::Address->new_from_integer(
20 integer => $ip_as_integer );
21 say join '/', $address->as_ipv4_string, $mask_length;
22 say np $data;
23 }
24);
Iterating: Review
Step 1
As in the previous example, we create a new MaxMind::DB::Reader
object.
Step 2
To dump our data, we pass an anonymous subroutine to the iterate_search_tree() method . (This method can actually take two callbacks, but the second callback is for debugging the actual nodes in the tree – that’s too low level for our purposes today.)
We’ve appropriately named the three arguments which are passed to the callback, so there’s not much more to say about them. Let’s look at the output.
1vagrant@precise64:/vagrant$ perl examples/03-iterate-search-tree.pl
28.8.8.0/28
3\ {
4 environments [
5 [0] "development",
6 [1] "staging"
7 ],
8 expires 3600,
9 name "Klaus"
10}
11123.125.71.29/32
12\ {
13 environments [
14 [0] "development",
15 [1] "staging",
16 [2] "production"
17 ],
18 expires 86400,
19 name "Jane"
20}
The output shows the first IP in each range (note that Jane’s IP is just a “range” of one) and then displays the user data with which we’re now familiar.
The Mashup
To extend our example, let’s take the data from an existing GeoIP2 database and combine it with our custom MMDB file.
If you’re using the Vagrant
VM, you have a copy of GeoLite2-City.mmdb
in
/usr/share/GeoIP
. If not, you may need to
download this file
or use
geoipupdate
. For more
details on how to set this up, you can look at the provision
section of the
Vagrantfile
in the GitHub repository.
You can take any number of fields from existing MaxMind databases to create your
own custom database. In this case, let’s extend our existing database by adding
city
, country
and time_zone
fields for each IP range. We can use this
information to (possibly) customize the user’s environment. We can use the time
zone when displaying dates or times. We can limit access to certain features
based on the country in which the user is currently located.
1#!/usr/bin/env perl
2
3use strict;
4use warnings;
5use feature qw( say );
6
7use GeoIP2::Database::Reader;
8use MaxMind::DB::Writer::Tree;
9use Net::Works::Network;
10
11my $filename = 'users.mmdb';
12my $reader = GeoIP2::Database::Reader->new(
13 file => '/usr/share/GeoIP/GeoLite2-City.mmdb',
14 locales => ['en'],
15);
16
17# Your top level data structure will always be a map (hash). The MMDB format
18# is strongly typed. Describe your data types here.
19# See https://metacpan.org/pod/MaxMind::DB::Writer::Tree#DATA-TYPES
20
21my %types = (
22 city => 'utf8_string',
23 country => 'utf8_string',
24 environments => [ 'array', 'utf8_string' ],
25 expires => 'uint32',
26 name => 'utf8_string',
27 time_zone => 'utf8_string',
28);
29
30my $tree = MaxMind::DB::Writer::Tree->new(
31
32 # "database_type" is an arbitrary string describing the database. At
33 # MaxMind we use strings like 'GeoIP2-City', 'GeoIP2-Country', etc.
34 database_type => 'My-IP-Data',
35
36 # "description" is a hashref where the keys are language names and the
37 # values are descriptions of the database in that language.
38 description =>
39 { en => 'My database of IP data', fr => "Mon Data d'IP", },
40
41 # "ip_version" can be either 4 or 6
42 ip_version => 4,
43
44 # add a callback to validate data going in to the database
45 map_key_type_callback => sub { $types{ $_[0] } },
46
47 # let the writer handle merges of IP ranges. if we don't set this then the
48 # default behaviour is for the last network to clobber any overlapping
49 # ranges.
50 merge_record_collisions => 1,
51
52 # "record_size" is the record size in bits. Either 24, 28 or 32.
53 record_size => 24,
54);
55
56my %address_for_employee = (
57 '123.125.71.29/32' => {
58 environments => [ 'development', 'staging', 'production' ],
59 expires => 86400,
60 name => 'Jane',
61 },
62 '8.8.8.8/28' => {
63 environments => [ 'development', 'staging' ],
64 expires => 3600,
65 name => 'Klaus',
66 },
67);
68
69for my $range ( keys %address_for_employee ) {
70
71 my $user_metadata = $address_for_employee{$range};
72
73 # Iterate over network and insert IPs individually
74 my $network = Net::Works::Network->new_from_string( string => $range );
75 my $iterator = $network->iterator;
76
77 while ( my $address = $iterator->() ) {
78 my $ip = $address->as_ipv4_string;
79 my $model = $reader->city( ip => $ip );
80
81 if ( $model->city->name ) {
82 $user_metadata->{city} = $model->city->name;
83 }
84 if ( $model->country->name ) {
85 $user_metadata->{country} = $model->country->name;
86 }
87 if ( $model->location->time_zone ) {
88 $user_metadata->{time_zone} = $model->location->time_zone;
89 }
90 $tree->insert_network( $network, $user_metadata );
91 }
92}
93
94# Write the database to disk.
95open my $fh, '>:raw', $filename;
96$tree->write_tree( $fh );
97close $fh;
98
99say "$filename has now been created";
Now, when we iterate over the search tree, we’ll see that the data has been augmented with the new fields.
1vagrant@precise64:/vagrant$ perl examples/03-iterate-search-tree.pl
28.8.8.0/28
3\ {
4 city "Mountain View",
5 country "United States",
6 environments [
7 [0] "development",
8 [1] "staging"
9 ],
10 expires 3600,
11 name "Klaus",
12 time_zone "America/Los_Angeles"
13}
14123.125.71.29/32
15\ {
16 city "Beijing",
17 country "China",
18 environments [
19 [0] "development",
20 [1] "staging",
21 [2] "production"
22 ],
23 expires 86400,
24 name "Jane",
25 time_zone "Asia/Shanghai"
26}
Adding GeoLite2-City Data: Review
To extend our example we make two additions to our original file:
Step 1
We create a new reader object:
1my $reader = GeoIP2::Database::Reader->new(
2 file => '/usr/share/GeoIP/GeoLite2-City.mmdb',
3 locales => ['en'],
4);
Note that this file may be in a different location if you’re not using
Vagrant
. Adjust accordingly.
Step 2
Now, we take our existing data so that we can augment it with GeoIP2 data.
1 my $user_metadata = $address_for_employee{$range};
2
3 # Iterate over network and insert IPs individually
4 my $network = Net::Works::Network->new_from_string( string => $range );
5 my $iterator = $network->iterator;
6
7 while ( my $address = $iterator->() ) {
8 my $ip = $address->as_ipv4_string;
9 my $model = $reader->city( ip => $ip );
10
11 if ( $model->city->name ) {
12 $user_metadata->{city} = $model->city->name;
13 }
14 if ( $model->country->name ) {
15 $user_metadata->{country} = $model->country->name;
16 }
17 if ( $model->location->time_zone ) {
18 $user_metadata->{time_zone} = $model->location->time_zone;
19 }
20 $tree->insert_network( $network, $user_metadata );
21 }
As in our first example, we’re create a new Net::Works::Network
object.
However, in this case we are going to insert each individual IP in the range.
The reason for this is that we don’t know if our IP ranges match the ranges in
the GeoLite2 database. If we just rely on using the reader data for some
arbitrary IP in the range, we can’t be 100% sure that this is representative of
all other IPs in the range. If we insert each IP in the range, we don’t need to
rely on the assumption that the data for a random IP will be consistent across
our ranges.
In order for this to work, we set merge_record_collisions => 1
when we created
the MaxMind::DB::Writer::Tree
object. This allows the writer to be smart about
merging ranges rather than letting a new range clobber any overlapping
addresses.
Note that this approach is fine for a small database, but it likely will not scale well in terms of speed when writing a database with a large number of records. If you’re looking to create a very large database and writing speed is an issue, you are encouraged to look into using the MaxMind CSVs to seed your database.
Iterating over a network is trivial.
1 my $network = Net::Works::Network->new_from_string( string => $range );
2 my $iterator = $network->iterator;
3
4 while ( my $address = $iterator->() ) {
5 my $ip = $address->as_ipv4_string;
6 ...
7 }
The next step is to look up an IP address using the reader.
1my $model = $reader->city( ip => $ip );
We need to pass the model a string
rather than an object
, so we call the
as_ipv4_string()
method.
Next we add new keys to Hash
. The new keys are country
, city
and
time_zone
. Note that we only add them if they exist. If we try to add an
undefined
value to the Hash
, it an exception will be thrown.
Now, let’s see what we get.
1vagrant@precise64:/vagrant$ perl examples/03-iterate-search-tree.pl
28.8.8.0/28
3\ {
4 city "Mountain View",
5 country "United States",
6 environments [
7 [0] "development",
8 [1] "staging"
9 ],
10 expires 3600,
11 name "Klaus",
12 time_zone "America/Los_Angeles"
13}
14123.125.71.29/32
15\ {
16 city "Beijing",
17 country "China",
18 environments [
19 [0] "development",
20 [1] "staging",
21 [2] "production"
22 ],
23 expires 86400,
24 name "Jane",
25 time_zone "Asia/Shanghai"
26}
Even though we inserted Klaus’s addresses individually, we can see that the writer did the right thing and merged the addresses into an appropriately sized network.
Deploying Our Application
Now we’re at the point where we can make use of our database. With just a few lines of code you can now use your MMDB file to assist in the authorization of your application or VPN users. For example, you might include the following lines in a class which implements your authentication.
1use MaxMind::DB::Reader;
2
3my $reader = MaxMind::DB::Reader->new( file => '/path/to/users.mmdb' );
4
5sub is_ip_valid {
6 my $self = shift;
7 my $ip = shift;
8
9 my $record = $reader->record_for_address( $ip );
10 return 0 unless $record;
11
12 $self->set_session_expiration( $record->{expires} );
13 $self->set_time_zone( $record->{time_zone} ) if $record->{time_zone};
14 return 1;
15}
Here’s a quick summary of what’s going on:
- As part of your deployment you’ll naturally need to include your
users.mmdb
file, stored in the location of your choice. - You’ll need to create a
MaxMind::DB::Reader
object to perform the lookup. - If the
$record
is undef, the IP could not be found. - If the IP is found, you can set a session expiration.
- If the IP is found, you can also set a time zone for the user. Keep in mind
that it’s possible that the
time_zone
key does not exist, so it’s important that you don’t assume it will always be available.
Pro Tips
Including the Contents of an Entire MaxMind DB
To include the contents of an entire GeoIP2 database rather than selected data points, you have a couple of options for iterating over a database in Perl.
MaxMind::DB::Reader
A very simple way to get started is to iterate over the search tree using
MaxMind::DB::Reader
as we did in examples/03-iterate-search-tree.pl
.
However, note that iterating over the entire tree using the Perl reader can be
quite slow.
Parsing a CSV
This requires slightly more logic, but reading a CSV file line by line will give you a significant speed boost over search tree iteration.
Free downloads of CSV files for GeoLite2 City and GeoLite2 Country
are available from MaxMind.com
.
If you’re using the Vagrant VM, you’ll find GeoLite2-City-Blocks-IPv4.csv
and
GeoLite2-City-Locations-en.csv
already in your /vagrant
directory.
examples/06-read-csv.pl
will give you a head start on parsing these CSVs.
Insert Order, Merging and Overwriting
It’s important to understand MaxMind::DB::Writer
’s configurable behaviour for
inserting ranges. Please see our documentation on
Insert Order, Merging and Overwriting
so that you can choose the correct behaviour for any overlapping IP ranges you
may come across when writing your own database files.
Taking This Further
Today we’ve shown how you can create your own MMDB database and augment it with data from a GeoLite2-City database. We’ve only included a few data points, but MaxMind databases contain much more data you can use to build a solution to meet your business requirements.
About our contributor: Olaf Alders is a Senior Software Engineer at MaxMind. After taking his first course in Fortran, Olaf earned an M.A. in Classical Philology from McMaster University and an M.A. in Medieval Studies from the University of Toronto. His open source projects include MetaCPAN.org , as well as various Perl modules . Follow him on Twitter @olafalders .