21

I want my origin to be able to see the User-Agent header .e.g: Gecko/20100101 Firefox/62.0 not Amazon CloudFront.

In the Behaviors tab I can whitelist User-Agent header, so it's passed to the origin correctly, however now CloudFront caches content per User-Agent, meaning that user visiting the CloudFront endpoint from different browsers forces CloudFront to go to the origin.

Is there any way to configure CloudFront to pass some headers to the origin, but not necessarily cache against them?

EDIT: I've got similar problem with Accept-Language header. I want to pass it to the origin, however I don't want to cache against it. Assets that I am caching are not Language dependent, however the non-cachable content is dependent on the Accept-Language header.

Tom Raganowicz
  • 2,169
  • 5
  • 27
  • 41
  • You mentioned CloudFlare twice but the question seems to be all about CloudFront. Clarify? – Michael - sqlbot Oct 24 '18 at 12:47
  • 2
    What is the point of sending a header to the origin if you are not caching against it? If I visit the site from a different browser, the origin sees nothing at all because the cached content is returned... but if that is acceptable, then the origin did not need to know my User-Agent because it didn't use that information to modify the response, and it wasn't for logging or statistical purposes, since the origin sees only a fraction of requests anyway. So... what problem are you actually trying to solve? Knowing that information should lead us to a proper solution. – Michael - sqlbot Oct 24 '18 at 12:52
  • 3
    I meant 'CloudFront' sorry, edited my question. Right, so me web application has cachable assets, images, JS, CSS, as well as non-cachable GET actions that returns personalized data to the user. There is still huge benefit of using CDN-like service. I am using `User-Agent` for statistics reasons when storing logs in Kibana. In my case I would like to know the `User-Agent`, as it's additional information which doesn't affect the content. CloudFront makes it impossible to achieve. Maybe my approach is not ideal, but it's how this application evolved and I am looking for tools to achieve that. – Tom Raganowicz Oct 24 '18 at 17:18

4 Answers4

23

You can use Lambda@Edge function (https://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html) assigned to your CloudFront distribution. You would need two functions:

  1. Viewer-Request event handler, that will read User-Agent header and copy it to e.g. X-My-User-Agent. Viewer-Request handler is invoked before the request from the client reaches your Cloudfront Distribution.
  2. Origin-Request event handler, that will read X-My-User-Agent and replace User-Agent. Origin-Request handler is invoked when Cloudfront did not find requested page in its cache and sends the request to the origin.

Please note that you should NOT add User-Agent to Cloudfront whitelist:

You can configure CloudFront to cache objects based on values in the Date and User-Agent headers, but we don't recommend it. These headers have a lot of possible values, and caching based on their values would cause CloudFront to forward significantly more requests to your origin.

Ref: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior

Example of Viewer-Request handler (Lambda@Edge can be written only in NodeJS or Python, Ref: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-requirements-limits.html#lambda-requirements-lambda-function-configuration):

'use strict';

exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const headers = request.headers;
  const customUserAgentHeaderName = 'X-My-User-Agent';
  const userAgent = headers['user-agent'][0].value;

  headers[customUserAgentHeaderName.toLowerCase()] = [
    {
      key: customUserAgentHeaderName,
      value: userAgent
    }
  ];


  callback(null, request);
};

Example of Origin-Request handler:

'use strict';

exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const headers = request.headers;
  const customUserAgentHeaderName = 'X-My-User-Agent';
  const realUserAgent = headers[customUserAgentHeaderName.toLowerCase()][0].value;

  headers['user-agent'] = [
    {
      key: 'User-Agent',
      value: realUserAgent
    }
  ];


  callback(null, request);
};
Lucas Carneiro
  • 530
  • 6
  • 13
illagrenan
  • 6,033
  • 2
  • 54
  • 66
  • 1
    An alternative to `realUserAgent` as the saved value is to use a stringified boolean value, i.e. `'true'` or `'false'`. Then change the header from `x-my-user-agent` to `isbot`. This way, if CloudFront includes custom headers in its cache whitelist, you're only caching based on a `'true'` or `'false'` value and can still get performant caching. – Elliott Apr 03 '19 at 21:42
  • Also, if trying to log from these edge lambdas, know that "Lambda creates CloudWatch Logs log streams in the CloudWatch Logs regions **closest to the locations where the function** is executed. The format of the name for each log stream is _/aws/lambda/us-east-1.function-name_ where **function-name** is the name that you gave to the function when you created it. So ensure you are checking the cloudwatch logs in the correct **REGION**." Taken from [here](https://stackoverflow.com/a/46563157/4561506) – Elliott Apr 03 '19 at 21:46
13

It seems that around July 2020 Amazon introduced new feature "Origin Request policies". You should be able to use it (without bogging down in Lambda@Edge): https://aws.amazon.com/blogs/networking-and-content-delivery/amazon-cloudfront-announces-cache-and-origin-request-policies/

TLDR:

Over time, we’ve seen numerous cases in which the new functionality could be useful for customers. Examples such as:

  • Forwarding information such as the User-Agent to the origin for analytics/logging but without serving different content variants based on device type (now you can forward the user-agent header and exclude it from the cache-key)

^ this is your use-case :)

domis86
  • 1,227
  • 11
  • 9
  • 2
    It's funny that I just read docs on AWS that told me it wasn't possible, but going to the Cloudfront Console it clearly is with the (relatively) new Origin Request Policies. I'd say this answer has now become the correct one... – JCP Dec 15 '21 at 01:17
  • maybe AWS docs are lagging behind their new features :) – domis86 Dec 15 '21 at 11:25
3

If the requests are cached across different user-agents, in case of a hit, the real-user agent will not be passed to the origin at all. CloudFront will just return the cached response.

You mentioned that you like to send the user-agent information to Elasticsearch. Unless you are only interested in the requests that are missed, you can not rely on the logs collected from the origin application.

If you have Lambda@Edge to send user-agent as realUserAgent, but the user-agent header is itself not a caching parameter, the origin will still not receive that data in case of a Miss.

The only solution that I see here, is to use the access logs generated from CloudFront. The CloudFront access logs contain not only user-agent but also IP addresses and other useful information. This data is logged for both Hit and Miss. It is also easy to set up a logstash to send this information to Elasticsearch.

cnvzmxcvmcx
  • 1,061
  • 2
  • 15
  • 32
0

This might be a simple solution. If you wanted User-Agent for a unique type of URLs example, /tracking/a,/tracking/b like this create a new distribution for this path [tracking*] and whitelist User-Agent for this distribution only. So you are not messing with AWS caching for all urls but for this path only.

Ibrahimsha
  • 180
  • 2
  • 14
  • 1
    I like that. Seems more economical than running an Edge lambda for every request, when you only need the user-agent for one route and not the 50 others in the API. – user14764 Nov 02 '20 at 11:19