Predicting location of one hop proxy users

Herein I will describe a simple technique that attempts to determine the location of a user in relation to their proxy. Obvious use-cases include restricting content based on location of user, augmenting existing fraud metrics for banks and online payment systems or by law enforcement. For anonymity systems this technique should exmplify why onion based routing systems such as Tor are important.

It’s simple: if we can determine the network latency between a proxy user and their proxy we can begin to make educated guesses about their location or at least how close they are to the proxy. This latency can be determined by analyzing any sequential serialized traffic. As an example imagine a website that has early in the HTML header code a request for a script. The browser will begin processing the HTML and then immediately request the script:

1: Users_browser     GET index.html  -->  Server
2: Users_browser <-- index.html data      Server
[browser processes index.html and see's a required script] 
3: Users_browser     GET script.js   -->  Server
4: Users_browser <-- script.js data       Server

If the user is sending and receiving via a proxy we need to figure out the latency between the user and the proxy, as well as the distance between the proxy and the server. The requests, with mnemonics representing the latency we need to determine (l_pu = latency between proxy and user, l_ps =  latency between proxy and server), look as follows:

                [       l_pu       ]     [l_ps]
1: Users_browser     index.html --> Proxy  --> Server
2: Users_browser <-- index.html     Proxy <--  Server
3: Users_browser     script.js  --> Proxy  --> Server
4: Users_browser <-- script.js      Proxy <--  Server

Determining l_ps can be done in many ways, the simplest being sending a ping request from the server to the IP address of the proxy. To determine l_pu the server sets a time stamp when it sends back the index.html data (line 2) and subtracts this from the time it receives the request for script.js (line 3). The result is the total round trip time between the user and the server (tRTT). Now l_pu can be calculated as:

l_pu = (tRTT - (l_ps * 2)) / 2

This simplification ignores a lot. Such as the latency overhead added due to how different browsers process certain content or the discrepancy between different users and proxies with differing types of uplinks that effect the latency (dialup, DSL, ISDN, cellular GPRS and UMTS). Even with such caveats this technique can still be used for applications such as restricting content to users that “should” be close to their exit node or proxy. If the caveats can be handled the potential exists for using a basic latency map of long haul cross continent network backbones to give clues such as if a proxy user accessing a system in the UK is coming from the west, south america or the east. On its own the use is limited but as one clue along with others it has more relevant application.

This technique can be applied with greater accuracy to other systems. Dor Levi and I developed a basic server based application to show users that might be using a proxy server-side. But this could be done client side as well. For example, a Flash applett could be built and included in online auctions so that both the seller or the auction house could detect when suspicious bids are coming from behind a proxy.

I would be interested to hear of other research in this area. Personally I am a strong advocate and sporadic developer that helps where I can in building stronger anonymity systems. I am also an absolute technologist and believe that evolution in this field requires progress from all directions.


About this entry