1

I'm looking to improve nginx caching by removing irrelevant query parameters (that could come from web crawlers or similar) from the request. I have come across an unwieldy solution on the internet:

set $c_uri $args; # e.g. "param1=true&param4=false"

# remove unwanted parameters one by one
if ($c_uri ~ (.*)(?:&|^)pd=[^&]*(.*)) { set $c_uri $1$2 ; }
if ($c_uri ~ (.*)(?:&|^)mid=[^&]*(.*)) { set $c_uri $1$2 ; }
if ($c_uri ~ (.*)(?:&|^)ml=[^&]*(.*)) { set $c_uri $1$2 ; }
if ($c_uri ~ (.*)(?:&|^)contact_eid=[^&]*(.*)) { set $c_uri $1$2 ; }
...

set $c_uri $scheme://$host$uri$c_uri;
...

location / {
  # set $c_uri as cache_key
  proxy_cache_key $c_uri;
  ...
}
    

It works, but it's not very concise, takes a lot of steps and from what I learned, if is evil.

I know there are maps, which can do basic regex things but they don't work in this scenario (because there can be any number of parameters in any order that I need to remove).

I also found this substitution module which can do regex replace but it's only made for specific operations and not for setting a variable.

So I have two questions:

  • Does anyone know whether there is some tooling to set a variable by doing a regex replace operation?
  • Is using if in this case really that bad? It's not inside a location context and I don't know whether many consecutive regexes are actually worse than one large regex replace.

I would be very thankful if someone with more nginx know-how could weigh in here and help me out. Thanks :)

Max Kless
  • 36
  • 4
  • Would it be an option to keep a defined set of args instead of removing unwanted ones? – slauth Sep 21 '21 at 12:57
  • "If is Evil... when used in location context" - however, you are not using `if` in a `location` context. – Richard Smith Sep 21 '21 at 13:40
  • @slauth sadly no, it's a large application and there are many possible args – Max Kless Sep 21 '21 at 14:47
  • @RichardSmith you are right and thank you for answering my second question. Still, I'm not sure what the performance implications of many if statements are. – Max Kless Sep 21 '21 at 14:48
  • 1
    Currently you have a long list of regular expressions, that each need to be evaluated individually for every request. If any or all of the arguments may appear in any request in a random order, then your solution is probably best. – Richard Smith Sep 21 '21 at 15:11
  • @MaxKless 1) For removing query arguments from the arguments list I'm using the following regex: `if ($c_uri ~ (.*)(^|&)pd=([^&]*)(\2|$)&?(.*)) { set $c_uri $1$4$5; }` The difference is that on `pd=a&b=c` string yours will give `&b=c` while mine version will give `b=c`. 2) For better performance I recommend to use [`pcre_jit on;`](https://nginx.org/en/docs/ngx_core_module.html#pcre_jit) if your environment allows it. – Ivan Shatsky Sep 21 '21 at 18:16

0 Answers0