For each transaction that the Arkose Labs network protects, one of the most basic signals that we are getting is the IP address of the client that connects to our services. Web security vendors have used this signal for over a decade. But of course, with the sorts of attacks that we are dealing with nowadays, the IP address alone is pretty useless: IP blacklisting or IP rate limiting methods have lost their effectiveness years ago. Still, there is much information we can derive from the IP address:
- Geo-location: where in the world is this IP address located (country, state, city, longitude, latitude, timezone)
- Network: which network the IP address is assigned to (BGP Asynchronous Number - a.k.a ASN), ISP, company name, connection type.
- Reputational: Is the IP address associated with a proxy or VPN service? Is the traffic velocity coming from that IP higher than the norm? Is it commonly used in fraudulent schemes?
At Arkose Labs, we take advantage of the above signals to improve the quality of our decision: we either use it to reinforce other signals and reduce false-negative risks or tune out other signals to reduce false-positive risks. The combination of signals we are getting from the IP intelligence is also sometimes deterministic enough to detect bad traffic on their own. Here’s a few examples that illustrate how we use IP intelligence:
A legitimate client coming from an unexpected location
Often, the data we collect on the client-side with our JavaScript indicates that we’re dealing with a legitimate device. For example, the latest version of Chrome running on a recent 13” MacBook Pro with the latest version of macOS. However, the network and reputation information may tell us a different story: the IP address belongs to a data center, and the overall reputation of the IP is bad. In that case, we probably should challenge this traffic as this is highly unexpected.
A very bad client
Sometimes we see a very strong correlation of all signals going from an invalid device, very bad reputation, high velocity of request and the traffic coming from a data center. For these types of scenarios, it makes sense to challenge the client with some of the most complex strategies.
Fraud farms
Since our product includes a captcha, we regularly see attackers outsourcing the puzzle resolution to semi-legitimate organizations that leverage low-cost labor. We refer to them as “fraud farms”, “sweatshops” or you may also hear the term “click farms”.
Low-cost labor is usually sourced from developing countries like Vietnam, the Philippines, China, Bangladesh, to name a few where earning 50 cents per hour still makes financial sense. As you can guess, the country the IP is located in is then important, coupled with a low-end device and a bad IP reputation (because of the significant amount of traffic) can help us decide to apply more pressure on this traffic resulting in the fraud farm worker having to play more rounds on more difficult games to complete the task.
IP intelligence is also very valuable to understand the attacker’s strategy and the resources they employ. It’s very common to see botnets that consist of well over 10,000 nodes. For the most simple setup, we see a distribution of those nodes in a couple of countries or networks. But more often, we see a quasi-global distribution of the traffic. Also, where in the past, deployment of these botnets within data centers was more common, attackers are shifting more and more to take advantage of the proliferation of proxy services worldwide. Some of the proxy services like BrightData (formerly Luminati), Soax.com, OxyLabs, or Storm Proxies include residential and mobile IP at a premium, further blurring the line between what’s legitimate and suspicious. Not necessarily good news for us web security vendors but understanding how attackers operate is hugely important so that we can design the next generation of our detection engine.
In summary, IP intelligence on its own is generally not enough as an effective detection strategy. However, it is a very valuable source of contextual information that when combined with other data sources can be effectively used to build accurate and reliable detection methods.
 
                                                                                         
                                                                                        