-1

I have a site at primary.example.com that serves a page which includes a javascript file from other.example.com

That javascript makes an XHR/ajax GET request to other.example.com/data.json

I think this is a CORS request because the origin should be primary.example.com

Therefore I am expecting the data.json request to include the Origin header. The browser should add this to the javascript request because it's going to a different domain than the page was loaded from.

All browsers I have tested on (Chrome, Firefox, WebKit) do this. But looking at my server logs I see lots of requests for data.json which were denied by the server at other.example.com because they did not send an Origin header in the request. Why would that be?

Edit

I don't know if this is relevant, but it's just occurred to me that it might be: the initial page load is triggered by a 302 redirect from the other server.

The full process is this:

UML sequence diagram

In this diagram, request ➐ is the AJAX request that is mostly but not always sent with an Origin header in the request.

artfulrobot
  • 20,637
  • 11
  • 55
  • 81
  • A browser won’t send the Origin header when somebody navigates/browses directly to `https://other.example.com/data.json` and opens it in the browser. Browsers basically only send the Origin header in ajax requests made from frontend JavaScript code. And server-side runtimes and HTTP clients such as curl and wget never send the Origin header unless for some reason you manually add the Origin header to any requests you send with them. So if you’ve configured your server to deny requests that don’t have the Origin header, then you’re only allowing ajax requests from frontend JavaScript code. – sideshowbarker Mar 25 '20 at 08:48
  • Yes, that's right. (I've edited the question to try to make this explicit.) I only want to allow ajax requests, it is front end javascript that is making the requests. I don't care about people messing around putting this url in browsers or curl/httpie etc. but the error log suggests that a descent number of valid-looking Chrome/Windows10 users are failing to access the resource because their browser is not sending the Origin header when the front-end js makes the request. – artfulrobot Mar 25 '20 at 08:55
  • See https://stackoverflow.com/a/42242802/441757 for the details, but the gist of it is: Browsers always send the Origin header in cross-origin requests made from frontend JavaScript code. The only exception is if the calling code has set the request mode to 'no-cors'. But any code which does that won’t be able to actually access the response, so it’s very unlikely your server is getting a significant number of requests from 'no-cors'-mode code. – sideshowbarker Mar 25 '20 at 09:02
  • Thanks, that seems to confirm what I would *expect* but not what I'm seeing. I've added more to the question though as I realised there was another factor possibly at play. – artfulrobot Mar 25 '20 at 10:57
  • Not sure what else can be said about this without repeating what I wrote earlier; but to be a bit more specific: When frontend code uses XHR to make a cross-origin request, the Origin header is always added, no matter what. Same goes for any ajax method in any JS library, which are all pretty much built on top of XHR. The few that aren’t can only be built on top of fetch instead—and fetch always adds the Origin header to request—except for one case, which is when the code has specified 'no-cors' mode. Whether the request is made after a redirect makes no difference. Nor does any other factor – sideshowbarker Mar 25 '20 at 11:19
  • One other thing that’s important to be aware of in general is that browsers sometimes change the value of the Origin header to 'null'. One case when browsers do that is for cross-origin redirects. But there are other cases too; see the details in the answer at https://stackoverflow.com/a/42242802/441757. So if your server code is checking for an Origin header with a specific value, then in the case of cross-origin redirects, it’s not going to be finding the value it’s expecting; it’s going to be seeing 'null' instead. – sideshowbarker Mar 25 '20 at 11:26
  • I get what you're saying, and appreciate it. I'm just still left with the observed facts not matching the theory! I don't believe it's hackers/spammers. PHP calculates `empty($_SERVER['HTTP_ORIGIN'])` to be true. So it's not even set to `'null'`. Thanks for your time anyway. – artfulrobot Mar 25 '20 at 12:16

1 Answers1

0

I believe this could be inbox scanners / link previewers etc. I spot checked 10 of the IP addresses that were doing this: they all belonged to Microsoft.

I'm now thinking that they have some automated process to sniff out links sent as part of their mail scanning (most people were sent the link to the page in an email), perhaps to check it looks legit', perhaps to generate a preview.

I'm using PHP FPM with nginx and in these cases there simply is no Origin header; as in there is no key called HTTP_ORIGIN IN THE $_SERVER array.

I believe, as the single-to-date commenter says, that this should not happen: it is invalid CORS behaviour. So the fact that it is happening suggests some non-normal browsing process is generating this. (Noting that the vast majority of traffic is valid; does have the Origin header.) Also that we've not had any complaints from people that the site is broken - as it's a petition for protection of vulnerable people facing hardship during coronavirus, you'd think that most people who visit the page would want to sign the petition (which requires CORS); in my experience such people are usually pretty vocal if that does not work.

Mail scanners is my conclusion.

artfulrobot
  • 20,637
  • 11
  • 55
  • 81