I recently brought delayed_job into my Rails 3.1.3 app. In development everything is fine. I even staged my DJ release on the same VPS as my production app using the same production application server (Thin), and everything was fine. Once I released to production, however, all hell broke loose: none of the jobs were entered into the jobs table correctly, and I started seeing the following in the logs for all processed jobs:
2012-02-18T14:41:51-0600: [Worker(delayed_job host:hope pid:12965)]
NilClass# completed after 0.0151
2012-02-18T14:41:51-0600: [Worker(delayed_job host:hope pid:12965)] 1
jobs processed at 15.9666 j/s, 0 failed ...
NilClass and no method name? Certainly not correct. So I looked at the serialized handler on the job in the DB and saw:
"--- !ruby/object:Delayed::PerformableMethod\nattributes:\n id: 13\n
event_id: 26\n name: memememe\n api_key: !!null \n"
No indication of a class or method name. And when I load the YAML into an object and call #object on the resulting PerformableMethod I get nil. For kicks I then fired up the console on the broken production app and delayed the same job. This time the handler looked like:
"--- !ruby/object:Delayed::PerformableMethod\nobject: !ruby/
ActiveRecord:Domain\n attributes:\n id: 13\n event_id: 26\n
name: memememe\n api_key: !!null \nmethod_name: :create_a\nargs: []
\n"
And sure enough, that job runs fine. Puzzled, I then recalled reading something about DJ not playing nice with Thin. So, I tried Unicorn and was sad to see the same result. Hours of research later and I think this has something to do with how the app server is loading the YAML libraries Psych and Syck and DJ's interaction with them. I cannot, however, pin down exactly what is wrong.
Note that I'm running delayed_job 3.0.1 official, but have tried upgrading to the master branch and have even tried downgrading to 2.1.4. Here are some notable differences between my stage and production setups:
- In stage I run 1 Thin server on a TCP port -- no web proxy in front
- In production I run 2+ Thin servers and proxy to them with Nginx. They talk over a UNIX socket
- When I tried unicorn it was 1 app server proxied to by Nginx over a UNIX socket
Could the web proxying/Nginx have something to do with it? Please, any insight is greatly appreciated. I've spent a lot of time integrating delayed_job and would hate to have to shelve the work or, worse, toss it. Thanks for reading.