3

Is there a way I can replace non alphanumeric characters returned with $request_uri with a space (or a +)?

What I'm trying to do is redirect all 404's in one of my sites to it's search engine, where the query is the uri requested. So, I have a block in my nginx.conf containing:

error_page 404 = @notfound;
location @notfound {
    return 301 $scheme://$host/?s=$request_uri;
}

While this does indeed work, the url's it's returning are the actual uri's complete with -_/ characters causing the search to always return 0 results

For instance... give this url: https://example.com/my-articles, the redirect ends up as this: https://example.com/?s=/my-articles

What I would like is to end up (ultimately) like this: https://example.com/?s=my+articles (tho, the + at the beginning works fine too... https://example.com/?s=+my+articles

I will need to do this without LUA or Perl modules. So, how can I accomplish this?

Steve Chambers
  • 37,270
  • 24
  • 156
  • 208
Kevin
  • 2,684
  • 6
  • 35
  • 64

3 Answers3

2

You may need to tweak this depending upon how far down your directory structure you want the replacement to go, but this is the basic concept.

Named location for initial capture of 404s:

location @notfound {
  rewrite (.*) /search$1 last;
}

Named locations are a bit limiting, so all this does is add /search/ to the beginning of the URI which returned 404. The last flag tells Nginx to break out of the current location and select the best location to process the request based on the rewritten URI, so we need a block to catch that:

location ^~ /search/ {
  internal;
  rewrite ^/search/(.*)([^a-z0-9\+])(.*)$ /search/$1+$3 last;
  rewrite ^/search/(.*)$ /?s=$1 permanent;
}

The internal directive makes this location only accessible to the Nginx process itself, any client requests to this block will return 404.

The first rewrite will change the last non text, digit or + character into a + and then ask Nginx to reevaluate the rewritten URI.

The location block is defined with the ^~ modifier, which means requests matching this location will not be evaluated against any regex defined location blocks, so this block should keep catching the rewritten requests.

Once all the non word characters are gone the first rewrite will no longer match so the request will be passed to the next rewrite, which removes the /search from the front of the URI and adds the query string.

My logs look like this:

>> curl -L -v http://127.0.0.1/users-forum-name.1
<<  "GET /?s=users+forum+name+1 HTTP/1.1"

>> curl -L -v http://127.0.0.1/users-forum-name/long-story/some_underscore
<< "GET /?s=users+forum+name+long+story+some+underscore"

You get the idea..

miknik
  • 5,748
  • 1
  • 10
  • 26
  • hmm... for some reason this is not working for me. you can test with: `https://gyo.im/this-is-a-dummy?_=12` – Kevin Sep 13 '18 at 15:07
  • hmm.... so, I had to add `fastcgi_intercept_errors`, in order to get started with processing the 404, however, it presents me with `https://gyo.im/?s=index+php&q=/this-is-a-dummy&_=12` – Kevin Sep 13 '18 at 15:10
  • Your initial request has a query string, so you'll need to get rid of that. Either add a `?` or using `set $args '';` index.php is appearing from somewhere, so guess you have it specified in an `index` or `try_files` directive? – miknik Sep 13 '18 at 21:39
  • for the time being I have implemented a hybrid redirect. using the nginx config I posted, but redirecting to a php file that cleans the input, does the string replacement, then the redirect – Kevin Sep 14 '18 at 12:27
1

You can use lua module, transform this variable to what you need using lua string functions. I'am using OpenResty which is basicly nginx with lua enabled. But nginx lua module will do fine. Here is directive that allows you to use lua inside nginx configuration. It could be inside location using content_by_lua_block / access_by_lua_block or in separate file using content_by_lua_file / access_by_lua_file. Here is documentation on this https://github.com/openresty/lua-nginx-module#content_by_lua . Here is an example from my app.

location ~/.*\.jpg$ {

  set $test '';
  access_by_lua_block {

    ngx.var.test = string.sub(ngx.var.uri, 2)

  }
  root /var/www/luaProject/img/;
  try_files    $uri /index.html;


  }
Donm
  • 41
  • 1
  • 8
  • 1
    `I will need to do this without LUA or Perl modules. So, how can I accomplish this?` – Kevin Sep 13 '18 at 15:00
  • Without LUA I would try to do rewrite just like other anwsers show. Why not just use modules? – Donm Sep 18 '18 at 09:21
0
  1. It is generally a bad idea to automatically issue redirects from 404 Not Found pages to elsewhere — the user might have simply mistyped a single character in the URL (e.g., on a mobile phone whilst copying the URL from a flier and having a "fat finger"), which would be very easy to correct once they see a 404 and the obvious typo in the address bar, yet may require starting from scratch if your search-engine doesn't deliver.

  2. If you still want to do it, it might be more efficient to do it within the search engine itself — after all, if your search engine isn't capable of searching by URL, and correcting typos, then it doesn't sound like a very useful search engine, now does it?

  3. If you still want to do it within the nginx alone in front of the search engine, then you can use the fact that http://nginx.org/r/rewrite directives essentially let you implement any sort of a DFA — Deterministic Finite Automaton — but, depending on the number of replacements required, it may result in too many cycles and somewhat inflexible rulesets.

    Take a look at the following resources on recursive replacements of given characters within the URL for other characters:

cnst
  • 25,870
  • 6
  • 90
  • 122
  • 1
    Sorry mate, #1 is useless, per the question. I do not need opinion's... I asked a question looking for an answer.... if I wanted opinions, I would have hit Facebook. #2 It's a wordpress site. #3, I will look those over... as I stated in another comment, I have a work around inplace for now – Kevin Sep 20 '18 at 20:10