7

Background: ETag tracking is well explained here and also mentioned on Wikipedia.

An answer I wrote in a response to "How can I prevent tracking by ETags?" has driven me to write this question.

I have a browser-side solution which prevents ETag tracking. It works without modifying the current HTTP protocol. Is this a viable solution to ETag tracking?

Instead of telling the server our ETag we ASK the server about its ETag, and we compare it to the one we already have.

Pseudo code:

If (file_not_in_cache)
{
    page=http_get_request();     
    page.display();
    page.put_in_cache();
}
else
{
    page=load_from_cache();
    client_etag=page.extract_etag();
    server_etag=http_HEAD_request().extract_etag();

    //Instead of saying "my etag is xyz",
    //the client says: "what is YOUR etag, server?"

    if (server_etag==client_etag)
    {
        page.display();
    }
    else
    {
        page.remove_from_cache();
        page=http_get_request();     
        page.display();
        page.put_in_cache();
    }
}

HTTP conversation example with my solution:

Client:

HEAD /posts/46328
host: security.stackexchange.com

Server:

HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
ETag: "EVIl_UNIQUE_TRACKING_ETAG"
Content-Type: text/html
Content-Length: 131

Case 1, Client has an identical ETag:

Connection closes, client loads page from cache.

Case 2, client has a mismatching ETag:

GET...... //and a normal http conversation begins.

Extras that do require modifying the HTTP specification

Think of the following as theoretical material, the HTTP spec probably won't change any time soon.

1. Removing HEAD overhead

It is worth noting that there is minor overhead, the server has to send the HTTP header twice: Once in response to the HEAD, and once in response to the GET. One theoretical workaround for this is modifying the HTTP protocol and adding a new method which requests header-less content. Then the client would request the HEAD only, and after that the content only, if the ETags mismatch.

2. Preventing cache based tracking (or at least making it a lot harder)

Although the workaround suggested by Sneftel is not an ETag tracking technique, it does track people even when they're using the "HEAD, GET" sequence I suggested. The solution would be restricting the possible values of ETags: Instead of being any sequence, the ETag has to be a checksum of the content. The client checks this, and in case there is a mismatch between the checksummed value and the value sent by the server, the cache is not used.

Side note: fix 2 would also eliminate the following Evercookie tracking techniques: pngData, etagData, cacheData. Combining that with Chrome's "Keep local data only until I quit my browser" eliminates all evercookie tracking techniques except Flash and Silverlight cookies.

Community
  • 1
  • 1
Hello World
  • 925
  • 7
  • 18
  • Given that you posted this on StackOverflow, what is the actual programming problem you're trying to solve? This seems like a request for comments and opinions, which is not what SO is for and will probably get your question closed under the "asking for an opinion" reason. – Mike 'Pomax' Kamermans Dec 02 '13 at 18:59
  • I am trying to prevent etag tracking by modifying the way browsers ask for pages. This is a programming issue, because implementing it involves modifying the way browsers work and not the HTTP protocol. I am not asking for opinion, I am asking for objective objections to this fix and looking for possible flaws that would prevent this from working. However, This is highly interrelated with security and networking, and I agree that it may be more suitable on a different site. I can do nothing but wait for the decision of the SO guys. – Hello World Dec 02 '13 at 19:10
  • I have omitted the word "opinion" from the question. – Hello World Dec 02 '13 at 19:17
  • How are you implementing `load_from_cache()`? I'm not familiar with any JavaScript mechanism to allow direct access to the cache. Also, if you don't supply an ETag or any cookies (or any other means of identifying yourself) in your `HEAD` request, you're likely to get served a new ETag, which seems just about as useful as clearing your cache. – apsillers Dec 02 '13 at 20:58
  • Note this is pseudo code, I haven't implemented load_from_cache yet. The idea is to modify the source of the browser, this has nothing to do with Javascript. Regarding your second argument: One is not supposed to get a new Etag unless the content changed, regardless of how your HEAD request looks like. IF you are getting a new Etag for each request, then the server is doing something nasty and not using the cache for that specific request would be the safe thing to do. This is more useful than clearing the cache because it's equivalent to clearing the cache only for Etag tracking servers. – Hello World Dec 03 '13 at 05:02
  • Best solution would be to disable etag-caching all together in browser private mode (at the moment you can set etags in normal mode and identify the users after they started private mode). I see no workaround that would prevent this kind of tracking - only tracking implementation will differ. – Manuel Arwed Schmidt Jun 17 '15 at 11:52

3 Answers3

5

It sounds reasonable, but workarounds exist. Suppose the front page was always given the same etag (so that returning visitors would always load it from cache), but the page itself referenced a differently-named image each time it was loaded. Your GET or HEAD request for this image would then uniquely identify you. Arguably this isn't an etag-based attack, but it still uses your cache to identify you.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • Wonderful idea! I think I've found a defense against that too. I will modify my question to take this into account. – Hello World Dec 03 '13 at 05:04
  • Question updated. assuming the HTTP protocol changes are applied, would people become immune to cache tracking? I firmly believe it's a yes. – Hello World Dec 03 '13 at 05:32
  • Couple of problems: (1) the mtime is sometimes used as the etag; this would prevent proper caching, since it could not be properly verified. (2) MD5 is sometimes used for the etag; this is susceptible to collision attacks. – Sneftel Dec 03 '13 at 10:43
  • (1) What I proposed in "2. Preventing cache based tracking" is standardizing what an Etag should be. (2) I don't see how this is related to collision attacks, could you explain further? – Hello World Dec 03 '13 at 10:52
  • (1) Yes, if you standardize it as a particular hash then that's fine, but good luck getting every website you'd like cached to go along with it. – Sneftel Dec 03 '13 at 10:57
  • 1
    (2) The existence of collision attacks means that the host could serve you one of many different pages, all of which had the same hash value. This would convince you to use the cached (but unique to you) page to request the linked resources. – Sneftel Dec 03 '13 at 10:59
  • (1) The sub title says "Extras that do require modifying the HTTP specification". (2) +1 Agreed, however that would require huge computational power, especially with big hashes, I don't think it's practical. – Hello World Dec 03 '13 at 11:13
  • I chose this as the best answer because it is a valid, simple, minimalist workaround (Although it is NOT an ETag-based attack). And the HTTP spec isn't changing any time soon. – Hello World Dec 03 '13 at 17:12
3

As long as any caching is used there's a potential exploit, even with the HTTP changes. Suppose the main page includes 100 images, each one randomly drawn from a potential pool of 2 images.

When a user returns to the site, her browser reloads the page (since the checksum doesn't match). On average, 25 of the 100 images will be cached from before. This combination can (almost certainly) be used to individually fingerprint the user.

Interestingly, this is almost exactly how DNA paternity testing works.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • Thank you, that is very enlightening. However it's about exploiting the cache and not the ETag directly. My solution (Without the HTTP changes) still works against pure ETag-attacks. You've proven that cache tracking is indeed much harder to stop, even with the HTTP changes. I will post a separate question regarding cache-based tracking. – Hello World Dec 03 '13 at 11:47
  • Very non critical for your point, but just wondering: How did you get to the number 25? – Hello World Dec 03 '13 at 13:57
  • Sorry, that should read 50. The 25 was from a previous idea I was considering, where one of the items in each pair was randomly generated per-load. – Sneftel Dec 03 '13 at 14:45
  • This specific technique would fail in its current form. On average: 50 images will be requested after the first visit, 25 after the second, etc. after a couple of visits the browser will almost certainly not request any images and tracking will be lost. Though your point is still valid and I see the problem. – Hello World Dec 03 '13 at 16:12
  • For maximum practicality, several sets of images would be used, with round-robin cache expiry dates. That would ensure that, for a reasonable range of revisit frequencies, at least one of the sets would provide effective fingerprinting. – Sneftel Dec 03 '13 at 16:28
0

The server could detect that for a number of resources you do a HEAD request which is not followed by a GET for the same resource. That's a tell if you were playing poker.

Just by having some resources cached, you are storing information. That information can be deduced by the server any time you do not re-request a resource named on the page.

Protecting your privacy in this manner comes at the cost of having to download every resource on the page with every visit. If you ever cache anything then you are storing information that can be inferred from your requests to the server.

Especially on mobile, where your bandwidth is more expensive and often slower, downloading all page resources on every visit could be impractical. I think at some level you have to accept that there are patterns in your interaction with the website which could be detected and profiled to identify you.

Mnebuerquo
  • 5,759
  • 5
  • 45
  • 52