52

I have some errors triggered by a chinese bot: http://www.easou.com/search/spider.html when it scrolls my websites.

Versions of my applications are all with Ruby 1.9.3 and Rails 3.2.X

Here a stacktrace :

An ArgumentError occurred in listings#show:

  invalid byte sequence in UTF-8
  rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params'


-------------------------------
Request:
-------------------------------

  * URL       : http://www.my-website.com
  * IP address: X.X.X.X
  * Parameters: {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"}
  * Rails root: /.../releases/20140708150222
  * Timestamp : 2014-07-09 02:57:43 +0200

-------------------------------
Backtrace:
-------------------------------

  rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params'
  rack (1.4.5) lib/rack/utils.rb:96:in `block in parse_nested_query'
  rack (1.4.5) lib/rack/utils.rb:93:in `each'
  rack (1.4.5) lib/rack/utils.rb:93:in `parse_nested_query'
  rack (1.4.5) lib/rack/request.rb:332:in `parse_query'
  actionpack (3.2.18) lib/action_dispatch/http/request.rb:275:in `parse_query'
  rack (1.4.5) lib/rack/request.rb:209:in `POST'
  actionpack (3.2.18) lib/action_dispatch/http/request.rb:237:in `POST'
  actionpack (3.2.18) lib/action_dispatch/http/parameters.rb:10:in `parameters'

-------------------------------
Session:
-------------------------------

  * session id: nil
  * data: {}

-------------------------------
Environment:
-------------------------------

  * CONTENT_LENGTH                                 : 514
  * CONTENT_TYPE                                   : application/x-www-form-urlencoded
  * HTTP_ACCEPT                                    : text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
  * HTTP_ACCEPT_ENCODING                           : gzip, deflate
  * HTTP_ACCEPT_LANGUAGE                           : zh;q=0.9,en;q=0.8
  * HTTP_CONNECTION                                : close
  * HTTP_HOST                                      : www.my-website.com
  * HTTP_REFER                                     : http://www.my-website.com/
  * HTTP_USER_AGENT                                : Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
  * ORIGINAL_FULLPATH                              : /
  * PASSENGER_APP_SPAWNER_IDLE_TIME                : -1
  * PASSENGER_APP_TYPE                             : rack
  * PASSENGER_CONNECT_PASSWORD                     : [FILTERED]
  * PASSENGER_DEBUGGER                             : false
  * PASSENGER_ENVIRONMENT                          : production
  * PASSENGER_FRAMEWORK_SPAWNER_IDLE_TIME          : -1
  * PASSENGER_FRIENDLY_ERROR_PAGES                 : true
  * PASSENGER_GROUP                                :
  * PASSENGER_MAX_REQUESTS                         : 0
  * PASSENGER_MIN_INSTANCES                        : 1
  * PASSENGER_SHOW_VERSION_IN_HEADER               : true
  * PASSENGER_SPAWN_METHOD                         : smart-lv2
  * PASSENGER_USER                                 :
  * PASSENGER_USE_GLOBAL_QUEUE                     : true
  * PATH_INFO                                      : /
  * QUERY_STRING                                   :
  * REMOTE_ADDR                                    : 183.60.212.153
  * REMOTE_PORT                                    : 52997
  * REQUEST_METHOD                                 : GET
  * REQUEST_URI                                    : /
  * SCGI                                           : 1
  * SCRIPT_NAME                                    :
  * SERVER_PORT                                    : 80
  * SERVER_PROTOCOL                                : HTTP/1.1
  * SERVER_SOFTWARE                                : nginx/1.2.6
  * UNION_STATION_SUPPORT                          : false
  * _                                              : _
  * action_controller.instance                     : listings#show
  * action_dispatch.backtrace_cleaner              : #<Rails::BacktraceCleaner:0x000000056e8660>
  * action_dispatch.cookies                        : #<ActionDispatch::Cookies::CookieJar:0x00000006564e28>
  * action_dispatch.logger                         : #<ActiveSupport::TaggedLogging:0x0000000318aff8>
  * action_dispatch.parameter_filter               : [:password, /RAW_POST_DATA/, /RAW_POST_DATA/, /RAW_POST_DATA/]
  * action_dispatch.remote_ip                      : 183.60.212.153
  * action_dispatch.request.content_type           : application/x-www-form-urlencoded
  * action_dispatch.request.parameters             : {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"}
  * action_dispatch.request.path_parameters        : {:action=>"show", :controller=>"listings", :id=>"location-t7-villeurbanne--58"}
  * action_dispatch.request.query_parameters       : {}
  * action_dispatch.request.request_parameters     : {}
  * action_dispatch.request.unsigned_session_cookie: {}
  * action_dispatch.request_id                     : 9f8afbc8ff142f91ddbd9cabee3629f3
  * action_dispatch.routes                         : #<ActionDispatch::Routing::RouteSet:0x0000000339f370>
  * action_dispatch.show_detailed_exceptions       : false
  * action_dispatch.show_exceptions                : true
  * rack-cache.allow_reload                        : false
  * rack-cache.allow_revalidate                    : false
  * rack-cache.cache_key                           : Rack::Cache::Key
  * rack-cache.default_ttl                         : 0
  * rack-cache.entitystore                         : rails:/
  * rack-cache.ignore_headers                      : ["Set-Cookie"]
  * rack-cache.metastore                           : rails:/
  * rack-cache.private_headers                     : ["Authorization", "Cookie"]
  * rack-cache.storage                             : #<Rack::Cache::Storage:0x000000039c5768>
  * rack-cache.use_native_ttl                      : false
  * rack-cache.verbose                             : false
  * rack.errors                                    : #<IO:0x000000006592a8>
  * rack.input                                     : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0>
  * rack.multiprocess                              : true
  * rack.multithread                               : false
  * rack.request.cookie_hash                       : {}
  * rack.request.form_hash                         :
  * rack.request.form_input                        : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0>
  * rack.request.form_vars                         : ���W�"��陷q�B��)���
�F��P   Z� 8�� &   G\y�P��u�T ed �.�%�mxEAẳ\�d*�Hg�     �C賳�lj��� � U 1��]pgt�P�
  Ɗ    ��c"� ��LX��D���HR�y��p`6�l���lN�P �l�S����`V4y��c����X2�        &JO!��*p �l��-�гU��w }g�ԍk�� (� F J��  q�:�5G�Jh�pί����ࡃ]                                                                                                                                                                                                                                                                           �z�h���� d }�}
  * rack.request.query_hash                        : {}
  * rack.request.query_string                      :
  * rack.run_once                                  : false
  * rack.session                                   : {}
  * rack.session.options                           : {:path=>"/", :domain=>nil, :expire_after=>nil, :secure=>false, :httponly=>true, :defer=>false, :renew=>false, :coder=>#<Rack::Session::Cookie::Base64::Marshal:0x000000034d4ad8>, :id=>nil}
  * rack.url_scheme                                : http
  * rack.version                                   : [1, 0]

As you can see there is no invalid utf-8 in the url but only in the rack.request.form_vars . I have about hundred errors per days, and all similar as this one.

So, I tried to force utf-8 in rack.request.form_vars with something like this:

class RackFormVarsSanitizer
  def initialize(app)
    @app = app
  end

  def call(env)
    if env["rack.request.form_vars"] 
      env["rack.request.form_vars"] = env["rack.request.form_vars"].force_encoding('UTF-8')
    end
    @app.call(env)
  end
end

And I call it in my application.rb :

config.middleware.use "RackFormVarsSanitizer"

It doesn't seem to work because I already have errors. The problem is I can't test in development mode because I don't know how to set rack.request.form_vars.

I installed utf8-cleaner gem but it fixes nothing.

Somebody have an idea to fix this? or to trigger it in development?

CupraR_On_Rails
  • 2,449
  • 1
  • 19
  • 24
  • If you can't set the `rack.request.form_vars` is there a reason you can't just make a PORO `model` to force the encoding. Just to loop through your params or something similar to `deep_stringify_keys` but substituting `val.force_encodinging('utf-8')` for the `to_s`? – MCB Jul 09 '14 at 13:42
  • 1
    Related: http://stackoverflow.com/questions/24611841/mysterious-rails-error-with-almost-no-trace – Henrik N Jul 13 '14 at 22:02
  • As Henrik N pointed out, we encountered the same problem, and the solution that worked for us was posted on http://stackoverflow.com/a/24637719/305019 – gingerlime Jul 14 '14 at 23:02
  • EasouSpider is off the hook. I'm convinced this is some kind of global cultural warfare. – gtd Jul 19 '14 at 11:48
  • How about adding EasouSpider to robots.txt? I don't suppose I get much traffic from that site, and rather than messing around with rack, maybe just getting rid of the pesky thing at the root is the best bet. – David N. Welton Jul 23 '14 at 05:47
  • @CupraR_Rails: Did you write any spec (e.g. Rspec) test for this? Please let me know. – K M Rakibul Islam Jul 24 '14 at 22:05

3 Answers3

33

So you don't have to piece together the comments in my other reply, this is what I'm doing now – I've seen no errors for 24 hours, so it looks very promising:

Add rack-utf8_sanitizer to your Gemfile:

gem 'rack-utf8_sanitizer'

and run

bundle

Put this middleware in app/middleware/handle_invalid_percent_encoding.rb and rename the class HandleInvalidPercentEncoding (because ExceptionApp is a bit too general).

In the config block of config/application.rb do:

require "#{Rails.root}/app/middleware/handle_invalid_percent_encoding.rb"


# NOTE: These must be in this order relative to each other.
# HandleInvalidPercentEncoding just raises for encoding errors it doesn't cover,
# so it must run after (= be inserted before) Rack::UTF8Sanitizer.
config.middleware.insert 0, HandleInvalidPercentEncoding
config.middleware.insert 0, Rack::UTF8Sanitizer  # from a gem

Deploy. Done.

(app happens to be the location for middleware in the project I'm working on, but I'd probably prefer lib. Whatever. Either should work.)

Henrik N
  • 15,786
  • 5
  • 82
  • 131
  • I will try your solution today and tomorrow in order to accept it. Thank you! – CupraR_On_Rails Jul 15 '14 at 07:15
  • No more error since I apply your solution! Great job! – CupraR_On_Rails Jul 16 '14 at 07:27
  • Just tried this out, and not surprisingly got a "HandleInvalidPercentEncoding not defined" error. Looking at the middleware this particular class is not defined. I thought that maybe you were piggybacking on Rails conventions for file names (MyClass maps to my_class.rb), but it's definitely not working for me. – Steven Garcia Jul 17 '14 at 20:34
  • @StevenGarcia Oh, duh, should have said that. I renamed the class as well. I'll edit the post. Thanks for pointing this out. – Henrik N Jul 17 '14 at 21:04
  • 4
    I turned this solution [into a Rails engine](https://github.com/sunny/handle_invalid_percent_encoding_requests) so this can now be installed in one line in your Gemfile. – Sunny Jul 22 '14 at 17:50
  • @sunny That's excellent! I would have attempted the same if this hadn't all happened just as I was going on vacation :) – Henrik N Jul 23 '14 at 07:29
  • The require line "#{Rails.root}/app/middleware/handle_invalid_percent_encoding.rb" threw a "cannot load such file" error. I fixed it by removing that require line, and adding quotes to the middleware name: config.middleware.insert 0, "HandleInvalidPercentEncoding" – David Lesches Jul 25 '14 at 15:04
  • @DavidLesches And you had a file at that path? Strange. What Rails version? – Henrik N Jul 27 '14 at 15:37
  • @HenrikN yep. Version 3.2.13 . I had seen this issue referenced elsewhere which is how I knew the trick to enclose HandleInvalidPercentEncoding in quotes. – David Lesches Jul 29 '14 at 14:15
  • PS. It was fixed in the latest rack (and rails 4.2) – Dmitry Polushkin Aug 07 '14 at 16:49
  • @DmitryPolushkin Glad to hear it! Tried to find the Rack commit but didn't see it. Just out of curiosity, could you link to it if you know? – Henrik N Aug 08 '14 at 08:13
  • Hi @HenrikN can't find out too, but it was there. After the rack 1.6.0 release. – Dmitry Polushkin Aug 08 '14 at 08:45
12

Add this line to your Gemfile, then run bundle in your terminal:

gem "handle_invalid_percent_encoding_requests"

This solution is based on Henrik's answer, turned into a Rails Engine gem.

Sunny
  • 5,825
  • 2
  • 31
  • 41
  • This is only compatible with Rails 4.1.4? – Tu H. Jul 23 '14 at 21:14
  • 1
    The gist Henrik referenced doesn't depend on Rails at all. Neither does rack-utf8-sanitizer. How do I know? I wrote the former and maintain the latter :) Thanks to Sunny for gemifying the code in the gist. – BF4 Nov 21 '14 at 04:00
  • @BF4 Thanks for the original code! I've updated my answer to make it clearer that this is a Rails Engine. If people would like to use this outside of Rails I would gladly accept pull-requests that remove the dependency. – Sunny Nov 25 '14 at 18:26
0

There is an issue in the gem repo with a link to someone's possible solution – they say it works for them but they're not sure if it's a good solution.

I've yet to try it, but I think I will.

Henrik N
  • 15,786
  • 5
  • 82
  • 131
  • I installed the gem with the pull request in 2 websites, I will see tomorrow if there are some benefits. – CupraR_On_Rails Jul 09 '14 at 16:16
  • Since deploying this I've sadly seen one "invalid byte sequence in UTF-8” and one "invalid %-encoding" due to EasouSpider. It feels like fewer errors than before, so maybe it has helped a little. – Henrik N Jul 10 '14 at 09:32
  • I agree, I have seen some errors, but maybe less than before. (ten this day compared to hundred before...). It's really strange because errors seem to be equivalent as other. I don't understand why. – CupraR_On_Rails Jul 11 '14 at 07:58
  • It's too early to say if it fixed everything, but what I'm trying now is https://github.com/whitequark/rack-utf8_sanitizer/ plus this middleware https://gist.github.com/bf4/d26259acfa29f3b9882b#file-exception_app-rb – Henrik N Jul 11 '14 at 13:36
  • It seems you must be careful about the order in which you include those two middlewares: https://github.com/whitequark/rack-utf8_sanitizer/pull/15#issuecomment-48805109 – Henrik N Jul 12 '14 at 07:50