Caching reverse proxy for dynamic content

Question

I was thinking about asking on Software Recommendations, but then I've found out that it may be a too strange request and it needs some clarification first.

My points are:

Each response contains an etag
- which is a hash of the content
- and which is globally unique (with sufficient probability)
The content is (mostly) dynamic and may change anytime (expires and max-age headers are useless here).
The content is partly user-dependent, as given by the permissions (which itself change sometimes).

Basically, the proxy should contain a cache mapping the etag to the response content. The etag gets obtained from the server and in the most common case, the server does not deal with the response content at all.

It should go like follows: The proxy always sends a request to the server and then either

1 the server returns only the etag and the proxy makes a lookup based on it and
- 1.1 on cache hit,
  - it reads the response data from cache
  - and sends a response to the client
- 1.2 on cache miss,
  - it asks the server again and then
  - the server returns the response with content and etag,
  - the proxy stores it in its cache
  - and sends a response to the client
2 or the server returns the response with content and etag,
- the proxy stores the data in its cache
- and sends a response to the client

For simplicity, I left out the handling of the if-none-match header, which is rather obvious.

My reason for this is that the most common case 1.1 can be implemented very efficiently in the server (using its cache mapping requests to etags; the content isn't cached in the server), so that most requests can be handled without the server dealing with the response content. This should be better than first getting the content from a side cache and then serving it.

In case 1.2, there are two requests to the server, which sounds bad, but is no worse than the server asking a side cache and getting a miss.

Q1: I wonder, how to map the first request to HTTP. In case 1, it's like a HEAD request. In case 2, it's like GET. The decision between the two is up to the server: If it can serve the etag without computing the content, then it's case 1, otherwise, it's case 2.

Q2: Is there a reverse proxy doing something like this? I've read about nginx, HAProxy and Varnish and it doesn't seem to be the case. This leads me to Q3: Is this a bad idea? Why?

Q4: If not, then which existing proxy is easiest to adapt?

An Example

A GET request like /catalog/123/item/456 from user U1 was served with some content C1 and etag: 777777. The proxy stored C1 under the key 777777.

Now the same request comes from user U2. The proxy forwards it, the server returns just etag: 777777 and the proxy is lucky, finds C1 in its cache (case 1.1) and sends it to U2. In this example, neither the clients not the proxy knew the expected result.

The interesting part is how could the server know the etag without computing the answer. For example, it can have a rule stating that requests of this form return the same result for all users, assuming that the given user is allowed to see it. So when the request from U1 came, it computed C1 and stored the etag under the key /catalog/123/item/456. When the same request came from U2, it just verified that U2 is permitted to see the result.

What you describe is a conditional GET in HTTP. The client does a GET with some specific HTTP headers telling the server to reply with content only if a specific condition match or does not match, like one based on validity date or ETag. — Patrick Mevzek, Nov 29 '17 at 23:09
@PatrickMevzek Then my description is confusing. I'm aware of conditional GET and that's something different. It assumes, that the initiator "guesses" the probable `etag` of the response (it may even send [more than one](https://stackoverflow.com/q/40186498/581205) in the `if-none-match` header). `+++` Here, the proxy queries the server without guessing and the server usually (case 1) responds with the `etag` only , hoping that the proxy gets a cache hit (case 1.1). There's also the possibility of a second query (case 1.2). — maaartinus, Nov 29 '17 at 23:36
No it doesn't assume that the client guesses anything since ETag values are opaque by design. The client sends an ETag value it has in its cache, related to the URL it queries. — Patrick Mevzek, Nov 30 '17 at 10:32
@PatrickMevzek By "guessing", I mean sending the recently received `etag`. It's "guessing" in the sense, that it may be right or wrong. +++ In my example above, both clients send no `etag` as it's their first access. The proxy may have seen many `etag`s, but it has no idea which one may apply for a given request as the response is user-specific and it lacks the corresponding logic. — maaartinus, Dec 01 '17 at 00:53

score 2 · Accepted Answer · answered Nov 30 '17 at 15:44

Q1: It is a GET request. The server can answer with an "304 not modified" without body.

Q2: openresty (nginx with some additional modules) can do it, but you will need to implement some logic yourself (see more detailed description below).

Q3: This sounds like a reasonable idea given the information in your question. Just some food for thought:

You could also split the page in user-specific and generic parts which can be cached independently.
You shouldn't expect the cache to keep the calculated responses forever. So, if the server returns a 304 not modified with etag: 777777 (as per your example), but the cache doesn't know about it, you should have an option to force re-building the answer, e.g. with another request with a custom header X-Force-Recalculate: true.
Not exactly part of your question, but: Make sure to set a proper Vary header to prevent caching issues.
If this is only about permissions, you could maybe also work with permission infos in a signed cookie. The cache could derive the permission from the cookie without asking the server, and the cookie is tamper proof due to the signature.

Q4: I would use openresty for this, specifically the lua-resty-redis module. Put the cached content into a redis key-value-store with the etag as key. You'd need to code the lookup logic in Lua, but it shouldn't be more than a couple of lines.

Thanks a lot! Some details: I don't think the server is allowed to respond without a body as the request is not conditional as the proxy sends no `if-none-match` header. In general, it can't as any `etag` it caches qualifies for the inclusion in the header. +++ Concerning `X-Force-Recalculate: true`, that's my case 1.2 (I wasn't clear about it). +++ Permissions in a signed cookie would be an excellent idea if they couldn't be revoked and if there weren't so many of them. There's content where it's only about permissions and other where it's more complicated. +++ I'll look into openresty. — maaartinus, Dec 01 '17 at 22:27
Regarding the 304, your're right, `if-none-match` would be mandatory as per [RFC7232 4.1](https://tools.ietf.org/html/rfc7232#section-4.1), but I guess it is the best option to model your requirements in the spirit of HTTP semantics. — Bernhard, Dec 02 '17 at 08:55

Caching reverse proxy for dynamic content

An Example

1 Answers1