0

I just found this bug where I'm calling

MyJob.perform_later(request.body.read)

with a sidekiq active_job job,

the call request.body.read returns some json, I figured that in some cases it might contain chars that are UTF-8 (i.e. € symbol),

in this case I'm getting

Encoding::UndefinedConversionError Exception: "\xE2" from ASCII-8BIT to UTF-8

I'm aware that sidekiq advises not to have complex or long job parameters, what would be a best practice workaround?

what I can think of is to base64 encode the string before passing it to the job (but this would make it even longer for sidekiq, I'm not sure this would be a problem) or store the actual json text in a db table, and just pass to the job the id of the new row. this would definitely work, but looks like an overkill to me.

any suggestions?

Don Giulio
  • 2,946
  • 3
  • 43
  • 82

1 Answers1

1

Sidekiq is going to use JSON.generate to serialize the job arguments. This is an example of what is happening to your ASCII-8BIT string that you can run in the console:

arg = "Example with € character".force_encoding('ASCII-8BIT')
JSON.generate([arg])
Encoding::UndefinedConversionError ("\xE2" from ASCII-8BIT to UTF-8)

One option would be to follow this answer and force the encoding to UTF-8 before you pass it into perform_later. Then it will serialize correctly:

arg = "Example with € character".force_encoding('ASCII-8BIT')
arg.force_encoding('UTF-8')
JSON.generate([arg])
 => "[\"Example with € character\"]"

So you'd want something like:

MyJob.perform_later(request.body.read.force_encoding('UTF-8'))
cschroed
  • 6,304
  • 6
  • 42
  • 56