1

Environment

  • Rails 3.2.11
  • Ruby 1.9.1
  • utf8-cleaner gem

I am struggling with this issue for over a year now, and I am not able to reproduce it on my dev environment, which makes it rather difficult for me to understand why this is happening and how I can resolve. Here is the error notification I am getting (via email using ExceptionNotifier):

A ArgumentError occurred in home#index:

  invalid byte sequence in UTF-8
  .bundle/gems/ruby/1.9.1/gems/rack-1.4.5/lib/rack/utils.rb:104:in `normalize_params'

Apparently caused by Chinese spider:

 HTTP_USER_AGENT      : Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)

I have tried to a couple things, see my earlier question where I was attempting to catch the error.

I have also installer utf8-cleaner gem but that doesn't seem to resolve, unless I missed a step.

How can I reproduce the issue? Note the URL causing the problem is perfectly correct when I access it (?)

UPDATE 20140721 - use rack-utf8_sanitizer

  • Add gem 'rack-utf8_sanitizer' in Gemfile
  • Add config.middleware.insert 0, Rack::UTF8Sanitizer in application.rb
  • $ bundle install

That worked perfectly on DEV but FAILED on my Heroku PRODUCTION, with the following issue:

$ heroku run rake middleware --a test-app
Running `rake middleware` attached to terminal... up, run.4846
WARNING: Nokogiri was built against LibXML version 2.8.0, but has dynamically loaded 2.7.6
rake aborted!
uninitialized constant Rack::UTF8Sanitizer
/app/config/application.rb:71:in `<class:Application>'

I am still investigating why I should be getting this.

Community
  • 1
  • 1
zabumba
  • 12,172
  • 16
  • 72
  • 129
  • Don't let the reader read a different page in order to understand your question. Make this question self-contained. – sawa Jul 19 '14 at 13:56

2 Answers2

1

I managed to fix it (on a Rails 3.2.18 app) as described in this gist:

https://gist.github.com/joost/ca4eda8f31655cf6095a

joost
  • 6,549
  • 2
  • 31
  • 36
1

Reproduce the issue caused by

HTTP_USER_AGENT      : Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)

Create a ruby script

#!ruby
invalid = "data\xed\xe5\xed\xe0".force_encoding('ASCII-8BIT')
`curl localhost:3000 -d #{invalid}`

Add the rack-utf8_sanitizer gem to your Gemfile

This resolved for my Dev environment, but wouldn't work on Heroku. I updated my question accordingly.

UPDATE:

I added require "rack/utf8_sanitizer" in my application.rb file and that seems to resolve, the Heroku issue.

zabumba
  • 12,172
  • 16
  • 72
  • 129