If you use a GeoIP database, you’re probably familiar with MaxMind’s MMDB format.
At MaxMind, we created the MMDB format because we needed a format that was very fast and highly portable. MMDB comes with supported readers in many languages. In this blog post, we’ll create an MMDB file which contains an access list of IP addresses. This kind of database could be used when allowing access to a VPN or a hosted application.
Tools You’ll Need
The code samples I include here use the Perl MMDB database writer and the Perl MMDB database reader. You’ll need to use Perl to write your own MMDB files, but you can read the files with the officially supported .NET, PHP, Java and Python readers in addition to unsupported third party MMDB readers. Many are listed on the GeoIP2 download page. So, as far as deployments go, you’re not constrained to any one language when you want to read from the database.
Use our GitHub repository to follow along with the actual scripts. Fire up a pre-configured Vagrant VM or just install the required modules manually.
The Apache Nutch community has been hard at work developing an open source web crawler. Nutch is a mature, production ready web crawler powering data acquisition, search and discovery for a broad spectrum of organizations over a broader spectrum of use cases. The Nutch 1.x branch enables fine grained configuration and relies on Apache Hadoop™ data structures, which are great for batch processing.
This post documents how reverse geolocation features were added to Nutch via MaxMind’s GeoIP2-java API, making good use of server IP addresses acquired within a Nutch crawl. Readers will take away:
- insight into why geocoding is appealing in today’s markets,
- practical code examples from the Nutch 1.x branch, showing how to use the GeoIP2-java API in order to geocode based on server IPs.
When it comes to choosing between the multiple IP geolocation data providers out there, our customers have told us they are most interested in one thing – accuracy. The question is, who provides the most accurate data?