0

We have a Google Cloud Function in live, which is essentially returning the correct redirects for us from a now-defunct site using a very simple Python script, backed by a CDN which is caching the responses to avoid triggering the function more than necessary.

We're not having any problems with how the function itself is working, however we have noticed that in response to a specific User-Agent (Bingbot) being passed with the request, Google Cloud Function is injecting a Cache-Control: Private header into the response independent of the function code (which does not specify a Cache-Control header inside the 301 response it sends back). This is causing all requests from Bingbots to be passed to the backend every time, causing our cloud function usage to be much higher than it would ordinarily be and incurring higher costs.

This also causes changes to Content and Transfer Encoding, although we are less concerned about this.

We tested this by stripping out the User-Agent header at the CDN level before the request to the backend (the function) and confirmed that without the Bingbot headers we get 0 persistent passes; allowing the header back through recreated the issue of far more Passes than we should be seeing.

We've begun stripping all User-Agent headers at this point which has solved the issue on a shallow basis, but we are concerned that this is undocumented behaviour and we have no information about when Cloud Functions may in other circumstances inject or manipulate response headers in response to request headers.

To confirm this isn't coming from our Python script, the relevant portion returning our response is as follows:

try:
     return flask.redirect(redirect_dict[request.path], code=301)
except:
     return flask.redirect(os.environ.get('FALLBACK_URL'), code=301)

Curl with Bingbot UA (actual URL & host obscured):

curl -v -X GET $function/$path -H 'User-Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)' -H $host

And the relevant response:

< HTTP/1.1 301 Moved Permanently
< Content-Type: text/html; charset=utf-8
< Function-Execution-Id: jzlrm3k4ndhv
< Location: $redirectURL
< X-Cloud-Trace-Context: 83841aa8390d4ea4c1c8349c3aca21be
< Content-Encoding: gzip
< Date: Mon, 20 May 2019 13:02:22 GMT
< Server: Google Frontend
< Cache-Control: private
< Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"
< Transfer-Encoding: chunked

Without Bingbot UA, the response is:

< HTTP/1.1 301 Moved Permanently
< Content-Type: text/html; charset=utf-8
< Function-Execution-Id: t8frc9wsdvzp
< Location: $redirectURL
< X-Cloud-Trace-Context: 1f817eecdc84ad4a7542fba5898caf50;o=1
< Date: Mon, 20 May 2019 13:02:37 GMT
< Server: Google Frontend
< Content-Length: 319
< Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"

We would expect responses to be the same as we are not injecting any cache-control headers in response to queries. Clearly varying the User Agent causes Google Cloud Functions to inject additional headers, vary encoding and otherwise transform responses. The concern is that there is no documentation or other information about this (unless I've missed it). If someone could point me at any kind of explanation, or if someone from Google could explain why this happens and any other settings we could use to prevent it, that would be the ideal outcome here.

TorTurner
  • 11
  • 4
  • If you want a response from Google, I suggest going to Google directly: https://cloud.google.com/support/ – Doug Stevenson May 21 '19 at 15:35
  • Thanks Doug - I recognise what you're saying, but equally the hope is that someone within the community has come across this before and can point me towards an explanation, which would avoid us having to pay out to get an answer from Google due to their product doing something they haven't documented. If I do have to go down that route, I will be sure to post any response here for folks who may run into this in the future. – TorTurner May 21 '19 at 15:49

0 Answers0