42

I need to write an nginx location directive to proxy requests to subdirectory to another server preserving urlencoding and removing subdirectory prefix.

Here's an artificial example — request like this:

http://1.2.3.4/api/save/http%3A%2F%2Fexample.com

should pass as

http://abcd.com/save/http%3A%2F%2Fexample.com

I tried several different ways. Here're couple of them:

  1. From this SO question

     location /api/ {
         rewrite ^/api(/.*) $1 break;
         proxy_set_header   X-Real-IP        $remote_addr;
         proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
         proxy_set_header   Host             $host;
         proxy_pass http://abcd.com;
     }
    

But it decodes the string, so http://abcd.com gets /save/http://example.com

  1. From another SO question

     location /api/ {
         proxy_set_header   X-Real-IP        $remote_addr;
         proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
         proxy_set_header   Host             $host;
         proxy_pass http://abcd.com;
     }
    

But it keeps subdirectory, so http://abcd.com gets /api/save/http%3A%2F%2Fexample.com.

What's needed is somewhere in the middle. Thank you!

UPD: Here's a ticket in nginx bug tracker

Antoine Pinsard
  • 33,148
  • 8
  • 67
  • 87
rinat.io
  • 3,168
  • 2
  • 21
  • 25
  • You could try something with lua. But first of all you shoulld not need this, as per http spec these urls are identical – Alexey Ten Feb 24 '15 at 04:48
  • Or use subdomain instead of subdirectory – Alexey Ten Feb 24 '15 at 04:49
  • @AlexeyTen The server running on the `http://abcd.com` is processing those requests in different way and I haven't control over it. Do you know a link to that http spec excerpt? I cannot find it – rinat.io Feb 24 '15 at 08:49
  • RFC 2616 section 3.2.3 – Alexey Ten Feb 24 '15 at 09:21
  • 1
    @AlexeyTen It says _Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding._ I don't know what is `[42]` in the RFC 2396, but section 2.2 in that RFC says that those characters are reserved — `";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","` – rinat.io Feb 24 '15 at 10:52
  • Well, I admit you're right, I've missed that part. But there is no easy way to fix this nginx behaviour. There are some bugs in nginx trac, you could add yours. http://trac.nginx.org/nginx/query?status=accepted&status=assigned&status=new&status=reopened&summary=~uri&order=priority. So, I think that the simplest way is to have subdomain. – Alexey Ten Feb 24 '15 at 14:49
  • @rinat.io, ping. Is there anything missing from my answer? – cnst Jun 22 '16 at 23:14
  • @AlexeyTen, actually, there is! See my answer, if you're still interested! – cnst Jun 22 '16 at 23:14
  • @cnst it's good as a brain training, but for real production OP better should fix backend – Alexey Ten Jun 23 '16 at 06:49
  • @AlexeyTen as I said in the ticket comment it doesn't seem like decoding works according to RFC (I'm not sure though), so I don't understand what might be wrong with backend. – rinat.io Jun 23 '16 at 16:26
  • @cnst, unfortunately I cannot check your answer now since I don't use that backend these days. – rinat.io Jun 23 '16 at 16:26
  • @rinat.io, but my answer does include a complete test setup with two separate servers, where it can be reproduced which directions do and do not result in the encoding being decoded; thanks for the upvote, but I'd appreciate an accept as well. :-) – cnst Jun 23 '16 at 16:32
  • @AlexeyTen, I agree with rinat, there is nothing that needs to be fixed in the backend (nor in nginx itself, btw). Decoding by default is a good usability and security feature of nginx; but disabling said decoding can also be beneficial for some test cases such as these. – cnst Jun 23 '16 at 16:35

2 Answers2

81

But there is no easy way to fix this nginx behaviour. There are some bugs in nginx trac, you could add yours. trac.nginx.org/nginx/…. So, I think that the simplest way is to have subdomain. – Alexey Ten Feb 24 '15 at 14:49

https://trac.nginx.org/nginx/ticket/727

If you want nginx to do something custom, you can do so using ​proxy_pass with variables (and the $request_uri variable, which contains original unescaped request URI as sent by a client). In this case it will be your responsibility to do correct URI transformations. Note though that this can easily cause security issues and should be done with care.

Challenge accepted!

    location /api/ {
        rewrite ^ $request_uri;
        rewrite ^/api/(.*) $1 break;
        return 400;
        proxy_pass http://127.0.0.1:82/$uri;
    }

That's it, folks!


Here's for the full proof.

The config file for nginx/1.2.1:

server {
    listen 81;
    #first, the solution
    location /api/ {
        rewrite ^ $request_uri;
        rewrite ^/api/(.*) $1 break;
        return 400; #if the second rewrite won't match
        proxy_pass http://127.0.0.1:82/$uri;
    }
    #next, a few control groups
    location /dec/ {
        proxy_pass http://127.0.0.1:82/;
    }
    location /mec/ {
        rewrite ^/mec(/.*) $1 break;
        proxy_pass http://127.0.0.1:82;
    }
    location /nod/ {
        proxy_pass http://127.0.0.1:82;
    }
}

server {
    listen 82;
    return 200 $request_uri\n;
}

Here are the results of running the queries for each location:

% echo localhost:81/{api,dec,mec,nod}/save/http%3A%2F%2Fexample.com | xargs -n1 curl
/save/http%3A%2F%2Fexample.com
/save/http:/example.com
/save/http:/example.com
/nod/save/http%3A%2F%2Fexample.com
%

Note that having that extra return 400; is quite important — otherwise, you risk having a security issue (file access through //api etc), as Maxim has briefly mentioned in your trac ticket.


P.S. If you think using the rewrite engine as a finite-state automaton is super cool, you might also want check out my http://mdoc.su/ project, or fork it github.

cnst
  • 25,870
  • 6
  • 90
  • 122
  • 1
    To avoid `the rewritten URI has a zero length` errors, I have used rewrite ^ $request_uri; rewrite ^/api(/.*) $1 break; return 400; proxy_pass http://127.0.0.1:82$uri; – Robert Hensing Feb 08 '18 at 15:17
  • 3
    I'm wondering if something has changed in later nginx versions. Try as I might, with nginx 1.13.12 and this approach any URL encoding present in the original request_uri (e.g. %2F) instead of decoding gets encoded again (so becomes %252F) – jrg Nov 21 '20 at 10:34
  • To comment on my comment - I was trying to do a rewrite for a redirect but it seems it's impossible to defeat the re-encoding there, and instead you need to extract the parts of the URL and then use a return instead. – jrg Nov 21 '20 at 11:07
  • It would be worth adding in the answer an explanation why `//api` matches the `location` block. It is because "_The matching is performed against a normalized URI, after decoding the text encoded in the “%XX” form, resolving references to relative path components “.” and “..”, and possible compression of two or more adjacent slashes into a single slash._" as stated in [location](https://nginx.org/en/docs/http/ngx_http_core_module.html#location) documentation. And compression being enabled by default - see [merge_slashes](https://nginx.org/en/docs/http/ngx_http_core_module.html#merge_slashes) – martin Feb 06 '21 at 12:41
2

What you have to do is fairly easy as long as we are talking prefix matching with ^~ or no modifier

location /api/ {
  # if you don't want to pass /api/ add a trailing slash to the proxy_pass
  proxy_pass http://localhost:8080/;

  ...
}

And everything will be passed along without decoding, you don't have to pass $uri

Also while you use proxy pass you should also set these headers

# pass headers and body along
proxy_pass_request_headers on;
proxy_pass_request_body on;

# set some headers to make sure the reverse proxy is passing along everything necessary
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
Gradient
  • 134
  • 1
  • 10
  • "everything will be passed" : True, until you replace `localhost` upstream with a variable. – martin Feb 05 '21 at 17:10
  • Permission to kick myself. So simple! *facepalms* ‍♀️ I actually had precisely the reverse issue — making sure that the full path is passed. Who'd guess that a trailing slash would make that much of a difference... thanks, your answer saved me hours and hours of debugging! – Gwyneth Llewelyn Jan 13 '22 at 09:42
  • I don't think this is true. Nginx doesn't decode the query string, but it still appears to decode the URI when I use this method. I've verified the accepted answer does not decode URI. – Gillespie Jun 13 '22 at 22:48