4

I redirect all my websites from HTTP to HTTPS with:

<VirtualHost *:80>
  ServerName example.com
  RewriteEngine on
  RewriteCond %{HTTPS} !on
  RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}
</VirtualHost>

<VirtualHost *:443>
  ServerName example.com
  DocumentRoot /www/example.com
  SSLEngine on  
  ...
</VirtualHost>

I notice that, when navigating from a site anothersite.com and

  • clicking on a link to https://example.com, Javascript's document.referrer works and gives anothersite.com

  • clicking on a link to http://example.com, Javascript's document.referrer is empty!

How to prevent document.referrer to vanish when using a HTTP->HTTPS redirection via Apache?

Or should I do the automatic HTTP->HTTPS redirection with another method to keep the referrer?

Basj
  • 41,386
  • 99
  • 383
  • 673

2 Answers2

1

The referrer header is sent by the browser, and apparently a new request doesn't carry the header from 2 requests ago.

As it's up to the browser to send this header or not, you only have limited options - and I can't even guarantee that it'll work:

Utilize HSTS by adding

Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains"

(or similar - pick values that you like) to your https virtual host. Then you'll still miss the first redirected referrer, but anybody coming back within a year (63072000 seconds) will connect on https right away.

Careful: If you offer anything on http (only), it'll be unavailable to any browser that has ever seen (and honors) the HSTS flag.

Also, there are many cases (and they changed in history, based on discovered vulnerabilities), in which the header is sent or not - with numerous articles that shine some light on all the conditions that cause the header to appear or disappear.

Check, then double check, if you're falling into one of the categories that can be omitted. You can't assume that the header is there in the first place.

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
  • Thank you for your answer. Often, it's the very first visit ever on a website which is interesting, it helps to know where the traffic is coming from. Does this mean this first visit's referrer is lost? (*"Then you'll still miss the first redirected referrer"*) Isn't there an Apache way to redirect with something else than RewriteRule to avoid losing the referrer? – Basj Dec 22 '20 at 08:46
  • A redirect starts a new request. You can also try to tag the referrer header onto the request URL as additional parameter *if it was in the http request in the first place*. Or correlate your http logs with the https logs, if the http logs log the referrer header. – Olaf Kock Dec 22 '20 at 09:02
  • see my additional edit with some more conditions under which the header appears (or is not sent) in the first place. – Olaf Kock Dec 22 '20 at 09:08
1

As stated in this answer, it is up to the browser to send the Referrer back after a redirect. And apparently, it does not.

However, you can write your rule like this and read the referrer from query if it does not exist in headers.

  RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}?referrer=%{HTTP_REFERER}

Note that a user can always spoof referrer. But this method will make it easier to spoof it. Depending on your use case, this solution may be a security issue for you.

Correction

According to this answer the referrer will be empty when user:

switches from a https URL to a http URL.

Additional information in HTTP specs.

Community
  • 1
  • 1
Gokhan Kurt
  • 8,239
  • 1
  • 27
  • 51
  • I edited, there was a type: `%{HTTP_REFERER}` and not `${HTTP_REFERER}`, is that what you meant? Strangely, it still does not work: the field is always an empty string for me. – Basj Dec 27 '20 at 20:38
  • @Basj Yeah, that was a typo. To be clear, you need to read `referrer` from header, then fallback to reading it from query string if it does not exist in header. – Gokhan Kurt Dec 27 '20 at 20:41
  • Yes this part is clear, I'll read it from query string in the (last) HTTPS request. But I don't understand why the HTTP request passes an empty `referrer=`. Maybe because `HTTP_REFERER` contains `http://` or `https://` and Apache Rewriterule doesn't accept rewriting URL like `https://example.com/?referrer=http://test.com`? Have you tested it, does it 100% work for you? – Basj Dec 27 '20 at 20:43
  • Shouldn't we `encodeURIComponent` the HTTP_REFERER as a query string? Does Apache RewriteRule allow to do such things? – Basj Dec 27 '20 at 20:47
  • @Basj I haven't tried it but you can pass query parameters in rewrite rule. Your problem is different. See this answer. https://stackoverflow.com/a/6880668/2346893 – Gokhan Kurt Dec 27 '20 at 20:49
  • You're right, the reason was `switches from a https URL to a http URL`. So does this mean there is no solution at all in this case? Let's say someone did an article (on his HTTPS website) with a link to `http://yourwebsite.com`. There is no way to get the referer in this case? – Basj Dec 27 '20 at 20:54
  • @Basj I believe so. This is something the browser decides. For example, the referrer may not be empty in IE 9, but it will be empty in Chrome because of security reasons. The only solution is to ensure that external websites use your https URL. – Gokhan Kurt Dec 27 '20 at 20:57