7

We're having a strange problem with one crawler. Occasionally it will throw a Rails FATAL error on some request, but the trace is very limited and looks something like this

[2014-07-01 18:16:37] FATAL Rails :
ArgumentError (invalid %-encoding (c ^   FK+ 9u$_    t  Kl
ΥE!   =k \  ̕* ߚ>c+<O   یo ʘ> C     R! 2 D  (5      x q#!` 4 p      |8 I   E
:+   H^9`^ #    Vo{   >

  =[z     )):
  lib/locale_middleware.rb:14:in `call'

The crawler user-agent is

Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)

We can ask it to stop crawling us via robots.txt, but it would be better to deal with the root cause and not fail with 500 on those requests if possible.

We can't really reproduce this kind of request either, so any suggestions on how to generate a similar request would be of great help.

We're using Rails 3.2.19, Unicorn on Ubuntu 12.04. Here's our locale_middleware.rb

gingerlime
  • 5,206
  • 4
  • 37
  • 66
  • Can you reproduce by having `?%9g` in your request url ? example: http://your.site/path/to/a/get?%9g – Benjamin Bouchet Jul 07 '14 at 13:49
  • I guess exception happen in next middleware layer. What is going after LocaleMiddleware ? You can run `rake middleware` – Pavel Evstigneev Jul 07 '14 at 14:36
  • @BenjaminSinclaire - adding `?%9g` seems to produce a `400` response from our nginx, it doesn't even hit rails. Where did you pick up this `%9g` however?? (I'm getting curious) – gingerlime Jul 07 '14 at 20:03
  • @PavelEvstigneev - after LocaleMiddleware we have `Omniauth::Builder` and after that our application routes. – gingerlime Jul 07 '14 at 20:07
  • 2
    here https://github.com/rack/rack/issues/337 – Benjamin Bouchet Jul 07 '14 at 22:26
  • Thanks @BenjaminSinclaire! Looks like a bingo. The solution seems to be listed here https://github.com/rack/rack/issues/337#issuecomment-35988871 - I was able to reproduce only on my dev box. Nginx seems to block most invalid stuff in the URL itself. But probably this happens in a header or body... In any case, looks like this should work. Thanks! If you want to post an answer, I'd be happy to accept. – gingerlime Jul 08 '14 at 06:05
  • Glad you fixed it, weird bug indeed. Cheers – Benjamin Bouchet Jul 08 '14 at 09:27

1 Answers1

4

Special thanks to Benjamin Sinclaire for pointing to the right issue on github.

The solution was described on this comment:

config.middleware.use ::Rack::Robustness do |g|
  g.no_catch_all
  g.on(ArgumentError) { |ex| 400 }
  g.content_type 'text/plain'
  g.body{ |ex| ex.message }
  g.ensure(true) { |ex| env['rack.errors'].write(ex.message) }
end
gingerlime
  • 5,206
  • 4
  • 37
  • 66
  • 2
    Does this catch every `ArgumentError`, including those that have nothing to do with encoding errors? I think it does, and that could be problematic. If you don't want that, [see this solution](http://stackoverflow.com/a/24727310/6962). – Henrik N Jul 14 '14 at 15:02
  • As far as I understand, this will catch an `ArgumentError` thrown from outside our app. We catch `ArgumentErrors` within our codebase on our application controller anyway. So the exposure is only outside of the codebase, e.g. middleware / Rails itself. Also, I added a logger method to the above snippet, and haven't spotted anything other than those encoding issues so far. – gingerlime Jul 14 '14 at 22:58