Saturday, May 19, 2007

Tracking users with Cache Data

There are several methods that browsers and web servers use to speed up browsing, so that less data needs to be transfered over the network, two of these methods are the ETag/If-None-Match and Last-Modified/If-Modified-Since headers. The premise is fairly simple for both.

With the ETag/If-None-Match headers, the server simply sends an ETag header for a resource the first time it is requested, and then sends the page - the next time the browser needs the same resource it sends an If-None-Match header, and sends the parameter the server returned in the ETag response header, as the parameter for the If-None-Match request header.

If the server responds with a 304 Not Modified status, and does not return a message body (it MUST NOT return a message body), then the ETag is preserved in the cache, and the browser will keep sending the same If-None-Match header until the cache is deleted, as long as it keeps getting 304 replies.

The system is identical, just with different header names for the Last-Modified/If-Modified-Since headers.

Sadly though, the ETag/if-None-Match headers are only supported by Firefox, whereas the Last-Modified/If-Modified-Since headers are supported in Firefox and IE - to my knowledge (through my testing) none of these headers are supported in Opera.

As such it would be better to use the Last-Modified/If-Modified-Since headers.

All you need to do now is embed a tracking image in each page, and send a unique date each time no If-Modified-Since header is sent, and a blank 304 response at all other times.

The biggest problem here though is that you do need a separate http request, and as such the only way to associate requests is per IP and time frame, e.g. any request made <=10 seconds before the request for a particular date/etag from, the same IP, is the same user. You could also try using the Referer header, but the odds of someone denying cookies, but sending Referers is very low, IMO.

You could also use Javascript instead of images, and then you would be able to link requests more easily, but it would require you make an additional request from that page with the URL in the query string and tracking id, or similar.

You would still need to use one of these techniques though, because you need to serve different pieces of javascript to different people, and have that piece of javascript cached as long as possible.

But even given this, this allows you a method to track users who deny cookies between browsing sessions - for tighter correlation during browsing sessions you could use Jeremiah Grossman's Basic Auth Tracking

P.S. This is stored with the other cache data, so this will only work as long as the image/resource is cached, and clearing the ache manually (or turning the cache off) will stop this technique.


Kishor said...

Another way is to implement something using Flash. Like Yahoo! sign in seal. Not many people will know how to bypass this kind of tracking.

kuza55 said...

True, I forgot about the Flash store, but I don't think Flash would be too effective, since the people you want to track would probably just disable Flash, Java, etc - they are blocking cookies (and presumably hoping proxies) after all.

Anonymous said...

AT-HE says:

hi ... i am trying to do this using php

in referer doc you can put some like [img src="trackimg.php?unique=3246246453"] (unique code generated ramdomly with php) and put an entry with that code in some file or db server side

then in the image which is a php script really , you can capture the headers and variables sent by browser and compare with file/db server side

next send header "304 not modified" ... or "200 OK" plus headers "content-type: image/xx" and dump image data

i am trying to set a public server with forums and so, and i am trying to catch if somebody sign up another account and do bad things, even refreshing ip, deleting cookies or someting

can you do this ? .. please tell me by mail/msn at : at_he @