At IPinfo, a question we get asked quite often is how accurate our data is. It's a straightforward question, really. You make an API call using IPinfo's services or download our database, provide your input IP address, and somehow you're magically presented with the city and country geolocation information and even granular information such as zip code and geographic coordinates. But how does it all work?
We also believe in transparency and accuracy. We want our users to understand how we provide the data and its accuracy. By walking you through our methodology, we aim to give you a more solid understanding of our data and process. And by discussing the nature and extent of our IP geolocation methodology, we hope that you understand not only the opportunities with our data but also its limitations and scope.
One of our secrets to success stems from our globe-spanning probe network infrastructure. We have a vast network of hundreds of interconnected probe servers distributed across the world. We also incorporate many publicly available databases that act as complementary data sources.
Our proprietary probe network infrastructure gives us a significant edge in ensuring the accuracy and reliability of our data products. Our investment and effort in developing and expanding our probe network are exponentially growing by the day, which means not only providing the best in class accuracy, it will only get better and better!
IPinfo’s Probe Network Explained
IPinfo’s proprietary probe network is a network of servers used in IP address validation and scanning the internet that is unique to IPinfo. IPinfo’s probe network represents a globe-spanning network of individual servers that systematically probe individual IP addresses to identify a number of attributes.
Through the probing process, we generate a geographic representation of the internet and how packets of data travel through it. We also run ping operations, traceroute analysis, port scanning, and more. From these attributes, we generate a few databases where IP geolocation data is the most prominent.
As of May 2023, we have a network of over 350 probe servers across the globe. We started building our probe network infrastructure a few years ago by adding servers across North America and Europe. But progressively, we have ramped up our investment exponentially. We have probe servers in various remote and niche regions to better ensure our accuracy coverage.
Even with hundreds of servers, we are not slowing down at all. We are continuously investing in expanding and developing this infrastructure. Every time we launch a new probe server, our data accuracy gets better. Data accuracy to us is a continuous journey, and we are not planning to stop.
Our system works similarly to GPS location systems: given satellites of known location, and the distance between these satellites and a device, there exists only a limited area in which the device can be located on Earth. We perform delay measurements between multiple probe servers (satellites) and IP addresses to geolocate them.
Let's consider a single server located in Paris, France. We perform a delay measurement from this server to the IP address we want to locate and get a value of 10 milliseconds round-trip, so 5 ms one way. Since most of the internet is made of optical fibers and that light travels at 200 km/milliseconds in an optical fiber, we know that the device must be within 5*200 = 1000 km of Paris. It cannot be further, which would mean the signal goes faster than light!
We can refine this geolocation by performing more delay measurements from more locations. For example, if we get a delay measurement of 1 ms from Berlin, the device must be located at the intersection of a circle of radius 1000 km centered around Paris and another one of radius 100 km centered around Berlin.
The more vantage points, the more accurate the geolocation. For the cases where we cannot get a small intersection, we use hints from various sources. For example, ISPs might tell us that a device is located in a specific city, or a specific country. - Maxime Mouchet, Data Engineer at IPinfo
Our view of the Internet
In general, the more vantage points, the more accurate the geolocation. However, this is only part of the story. Measuring the delay towards IP addresses is ridden with technical difficulties.
For one, not all equipment replies to probe packets. Some are in corporate networks or consumer ISP networks, which filter part of the traffic. Some might reply to traceroute measurements but not to ping measurements.
Some equipment might lie on their identity and reply with an IP address belonging to another equipment or a private IP address. Some equipment, such as firewalls and NATs, might modify probe or reply packets, thus giving erroneous delay measurements.
Some IP addresses are not globally unique, as is the case for anycast IP addresses used by DNS servers (e.g., 184.108.40.206) which map to multiple physical locations.
The good news is that we do not need to measure every IP address. Subsequent IP addresses used by end hosts (computers, servers) tend to be located near each other (e.g.
a.b.c.1 is likely to be geographically close to
a.b.c.2). This allows IP addresses to be aggregated in ranges and make the network faster and easier to debug by reducing the size of the routing tables in the routers. As such, if we can accurately locate one IP address in a range, we might be able to infer the location of the other devices in the same range.
However, this is not true for router IPs that are more scattered geographically. For example, one router with an IP
.1 in Paris, which links to a router
.2 in Brussels.
We perform two kinds of measurements: ping measurements which return the round-trip time between a vantage point and an IP address, and traceroute measurements which return the routers on the path and the round-trip time between them and our vantage point.
We measure the delay towards 350M IPs, and the path towards 50M IPs, every week. This gives us more than 90B delay measurements per week. We discover 3M IPv4 routers and 4M IPv6 routers. 20M IPv4 links between routers and 9M for IPv6. 10M links between autonomous systems and 5M links between countries.
IP geolocation and beyond
Our vast historical data puts us in a unique position to detect internet topology pattern changes and use that to optimize the accuracy of geolocation algorithms. - Alex Rodrigues, Data Engineer at IPinfo
By pinging billions of IP addresses weekly, we are effectively mapping out the internet and gaining insight into how the Internet as a whole functions. In addition to our standard database offerings, we provide sophisticated custom data solutions. Contact our data experts to explore how we can help you develop innovative solutions.
A promise for continuous improvement
IPinfo is more than just a service. Behind the data, we represent a robust infrastructure. We have been growing rapidly, continuously investing in our probe network infrastructure, and developing sophisticated, cutting-edge data algorithms and research. In a short period, we have built a probe network infrastructure of 300 servers strong (as of March 2023), and we are not stopping there. We are constantly developing new and innovative solutions that can help you make informed decisions with our data without any doubt of inaccuracy.
Ready to experience the accuracy and reliability of IPinfo's data products? Contact our data experts to obtain your IP geolocation or explore custom data solutions for your network. Let us help you make informed decisions with confidence.
If you'd like to learn more about our constant pursuit to achieve the highest accuracy possible with IP data, you can check out some of our other articles: