2

Does Cloudfront require special settings to trigger a log?

I have the following flow:

Devices -> Cloudfront -> API Gateway -> Lambda Function

which works, but Cloudwatch doesn't seem to create logs for the lambda function (or API Gateway). However, the following flow creates logs:

Web/Curl -> API Gateway -> Lambda Function
Mars
  • 2,505
  • 17
  • 26
  • 1
    Actually, you do not need to set the cloudfront -> API Gateway explicitly. In API Gateway configuration, you could enable Caching, which also powered by cloudfront. https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html – kkpoon Jun 15 '18 at 01:12
  • @kkpoon Sorry, I don't see the connection. I don't want the responses cached. I just want to be able to debug lambda functions called by devices – Mars Jun 15 '18 at 01:19
  • 1
    I see. Your device sent GET request to the end point? It might be hit the CloudFront cache. So the Lambda was not triggered. – kkpoon Jun 15 '18 at 01:36
  • @kkpoon I also thought it may be the cache, but the response changed and there was still no log – Mars Jun 15 '18 at 01:44
  • 1
    "Lambda functions from CloudFront" -- are you actually referring to Lambda@Edge rather than API Gateway? I don't think you are, but that is the only situation where CloudFront can cause Lambda logging to be different, because the Lambda function will run in the region nearest the viewer/device, and log in that region. But CloudFront -> API Gateway -> Lambda has no such interaction. – Michael - sqlbot Jun 15 '18 at 09:14
  • @Michael-sqlbot Thanks, but unfortunately I’m not. I also checked the logs for every region, but unfortunately, nothing. – Mars Jun 19 '18 at 00:05
  • @Mars the [execution role](https://docs.aws.amazon.com/lambda/latest/dg/intro-permission-model.html) of the Lambda function needs permission to actually write to the logs. Do you have that? – Michael - sqlbot Jun 19 '18 at 00:08
  • @Michael-sqlbot It's writing logs, just not when triggered through cloudfront – Mars Jun 19 '18 at 00:12
  • Sorry, I should have remembered that. Capture the response headers when making the request through CloudFront and look for `X-Cache` and `Age`. What do you find? – Michael - sqlbot Jun 19 '18 at 06:28
  • @Michael-sqlbot Unfortunately, I don't have access to the Cloudfront side. However, it appears the response was changing, so I don't think it was the cache. I've asked the other side to double check though. I'll update you if I ever get an answer – Mars Jun 19 '18 at 06:54
  • You can't make a request to CloudFront using curl? And you don't have access to the CloudFront logs? There's really no other explanation that comes to mind, other than cached responses. CloudFront is not tightly coupled to API Gateway or any other origin server at the point when requests are being made -- the target service (which doesn't even have to be in AWS at all, it can be anywhere) just sees an ordinary incoming HTTP/S request from the Internet. – Michael - sqlbot Jun 19 '18 at 07:17
  • @Michael-sqlbot I checked. There is an X-Cache tag, however it's constantly "Miss from cloudfront." Caching should most likely be off... If it's off, would it always be "Miss from cloudfront" ? or Would the tag simply not show? – Mars Jun 19 '18 at 07:57
  • 1
    If you see `X-Cache: Miss from CloudFront` in a response, then the CloudFront distribution to which you are directly connecting definitely did not serve that response from its cache, and so can be ruled out. You have an unusual issue, here, but intuition suggests that it's going to come down to the system not being provisioned the way you expect... such as the CloudFront distro pointing to the wrong API endpoint or wrong account (e.g. pointing to an endpoint in your QA account rather than your staging account, so the behavior is similar enough to be convincing). – Michael - sqlbot Jun 19 '18 at 10:20
  • @Michael-sqlbot I'm seeing X-Cache miss from Curl, but are there any other client-side settings that would effect that? Like proxy or client-side security settings? If not, I guess there's nothing left except to have the other side check the cloudfront settings – Mars Jun 20 '18 at 00:18
  • @Michael-sqlbot There are only 2 possible targets for the API that would give convincing behavior, but neither produced logs. I guess I should probably go ask on the AWS forums instead! – Mars Jun 20 '18 at 00:30
  • I don't think you'll have any better luck, there. There's really nothing to set in CloudFront that could trigger an API Gateway-invoked Lambda function not to log requests that it would otherwise log. Without access to the CloudFront settings, so that you can confirm things like the configured Origin Domain Name, you're at a disadvantage. – Michael - sqlbot Jun 20 '18 at 01:00
  • @Michael-sqlbot Just realized I could check if my curl -> CF -> API -> lambda calls were getting logged, and they were. I can't really imagine anything other than client-side cache settings having some kind of effect.. – Mars Jun 20 '18 at 01:43
  • @Michael-sqlbot In the end, changing the Maximum TTL and Default TTL on the cloudfront end to 0 solved it. Don't know why the cache was missing for Curl and web browsers, but it seems it was the cloudfront cache afterall. Do you want to update an answer? – Mars Jun 22 '18 at 00:11
  • 1
    @Mars that's interesting, but it defies explanation... more investigation is needed in order to figure out how it could possibly be the case. Unfortunately, it sounds like you don't have access to the full stack. There's one possibility that comes to mind, an interaction between CloudFront and API Gateway, that *shouldn't* happen... but I'll try to replicate this. Until then, I'm not sure what could be added to the answer. – Michael - sqlbot Jun 22 '18 at 00:42
  • @Michael-sqlbot Could it not just be something to do with the curl/browser's header settings? It seems default curl should cache, and I even added the Cache-Control header manually, but no luck. Could it also have something to do with my company's proxy? Either way, I'd be curious to hear what you find out! – Mars Jun 22 '18 at 00:49
  • 1
    Your company's proxy could very well be the culprit... the problem, there, is that changing the TTLs in CloudFront would be invisible to the proxy. – Michael - sqlbot Jun 22 '18 at 01:55

1 Answers1

1

In comments, above, we seem to have arrived at a conclusion that unanticipated client-side caching (or caching somewhere between the client and the AWS infrastructure) may be a more appropriate explanation for the observed behavior, since there is no known mechanism by which an independent CloudFront distribution could access a Lambda function via API Gateway and cause those requests not to be logged by Lambda.

So, I'll answer this with a way to confirm or reject this hypothesis.

CloudFront injects a header into both requests and responses, X-Amz-Cf-Id, containing opaque tokens that uniquely identify the request and the response. Documentation refers to these as "encrypted," but for our purposes, they're simply opaque values with a very high probability of uniqueness.

In spite of having the same name, the request header and the response header are actually two uncorrelated values (they don't match each other on the same request/response).

The origin-side X-Amz-Cf-Id is sent to the origin server in the request is only really useful to AWS engineers, for troubleshooting.

But the viewer-side X-Amz-Cf-Id returned by CloudFront in the response is useful to us, because not only is it unique to each response (even responses from the CloudFront cache have different values each time you fetch the same object) but it also appears in the CloudFront access logs as x-edge-request-id (although the documentation does not appear to unambiguously state this).

Thus, if the client side sees duplicate X-Amz-Cf-Id values across multiple responses, there is something either internal to the client or between the client and CloudFront (in the client's network or ISP) that is causing cached responses to be seen by the client.

Correlating the X-Amz-Cf-Id from the client across multiple responses may be useful (since they should never be the same) and with the CloudFront logs may also be useful, since this confirms the timestamp of the request where CloudFront actually generated this particular response.

tl;dr: observing the same X-Amz-Cf-Id in more than one response means caching is occurring outside the boundaries of AWS.


Note that even though CloudFront allows min/max/default TTLs to impact how long CloudFront will cache the object, these settings don't impact any downstream or client caching behavior. The origin should return correct Cache-Control response headers (e.g. private, no-cache, no-store) to ensure correct caching behavior throughout the chain. If the origin behavior can't be changed, then Lambda@Edge origin response or viewer response triggers can be used to inject appropriate response headers -- see this example on Server Fault.

Note also that CloudFront caches 4xx/5xx error responses for 5 minutes by default. See Amazon CloudFront Latency for an explanation and steps to disable this behavior, if desired. This feature is designed to give the origin server a break, and not bombard it with requests that are presumed to continue to fail, anyway. This behavior may cause various problems in testing as well as production, so there are cases where it should be disabled.

Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427
  • Thank you. Very thorough! It will take a while and a bunch of paperwork before I can confirm or disprove the cache-control theory, but thank you for the help! – Mars Jun 21 '18 at 00:16