I'm looking to improve nginx caching by removing irrelevant query parameters (that could come from web crawlers or similar) from the request. I have come across an unwieldy solution on the internet:
set $c_uri $args; # e.g. "param1=true¶m4=false"
# remove unwanted parameters one by one
if ($c_uri ~ (.*)(?:&|^)pd=[^&]*(.*)) { set $c_uri $1$2 ; }
if ($c_uri ~ (.*)(?:&|^)mid=[^&]*(.*)) { set $c_uri $1$2 ; }
if ($c_uri ~ (.*)(?:&|^)ml=[^&]*(.*)) { set $c_uri $1$2 ; }
if ($c_uri ~ (.*)(?:&|^)contact_eid=[^&]*(.*)) { set $c_uri $1$2 ; }
...
set $c_uri $scheme://$host$uri$c_uri;
...
location / {
# set $c_uri as cache_key
proxy_cache_key $c_uri;
...
}
It works, but it's not very concise, takes a lot of steps and from what I learned, if is evil.
I know there are maps, which can do basic regex things but they don't work in this scenario (because there can be any number of parameters in any order that I need to remove).
I also found this substitution module which can do regex replace but it's only made for specific operations and not for setting a variable.
So I have two questions:
- Does anyone know whether there is some tooling to set a variable by doing a regex replace operation?
- Is using if in this case really that bad? It's not inside a location context and I don't know whether many consecutive regexes are actually worse than one large regex replace.
I would be very thankful if someone with more nginx know-how could weigh in here and help me out. Thanks :)