10

Hey. I use delayed_job for background processing. I have 8 CPU server, MySQL and I start 7 delayed_job processes

RAILS_ENV=production script/delayed_job -n 7 start 

Q1: I'm wondering is it possible that 2 or more delayed_job processes start processing the same process (the same record-row in the database delayed_jobs). I checked the code of the delayed_job plugin but can not find the lock directive in a way it should be (no lock table or SELECT...FOR UPDATE).

I think each process should lock the database table before executing an UPDATE on lock_by column. They lock the record simply by updating the locked_by field (UPDATE delayed_jobs SET locked_by...). Is that really enough? No locking needed? Why? I know that UPDATE has higher priority than SELECT but I think this does not have the effect in this case.

My understanding of the multy-threaded situation is:

Process1: Get waiting job X. [OK]
Process2: Get waiting jobs X. [OK]
Process1: Update locked_by field. [OK]
Process2: Update locked_by field. [OK]
Process1: Get waiting job X. [Already processed]
Process2: Get waiting jobs X. [Already processed]

I think in some cases more jobs can get the same information and can start processing the same process.

Q2: Is 7 delayed_jobs a good number for 8CPU server? Why yes/not.

Thx 10x!

xpepermint
  • 35,055
  • 30
  • 109
  • 163

1 Answers1

11

I think the answer to your question is in line 168 of 'lib/delayed_job/job.rb':

self.class.update_all(["locked_at = ?, locked_by = ?", now, worker], ["id = ? and (locked_at is null or locked_at < ?)", id, (now - max_run_time.to_i)])

Here the update of the row is only performed, if no other worker has already locked the job and this is checked if the table is updated. A table lock or similar (which by the way would massively reduce the performance of your app) is not needed, since your DBMS ensures that the execution of a single query is isolated from effects off other queries. In your example Process2 can't get the lock for job X, since it updates the jobs table if and only if it was not locked before.

To your second question: It depends. On an 8 CPU server. which is dedicated for this job, 8 workers are a good starting point, since workers are single threaded you should run one for every core. Depending on your setup more or less workers are better. It heavily depends on your jobs. Take your jobs advantage of mutiple cores? Or does your job wait most of the time for external resources? You have experiment with different settings and have a look at all involved resources.

gregor
  • 4,733
  • 3
  • 28
  • 43
  • So you are saying that each proces is atom-style process and is safe? – xpepermint Apr 25 '10 at 14:24
  • What I think is missing here is SELECT ... FOR UPDATE. ? – xpepermint Apr 25 '10 at 15:25
  • The query is atomic. Hence if you execute the query `UPDATE jobs SET locked_at = '..', locked_by = 1 WHERE id = 12 and (locked_at is null or locked_at < '..')`, then locked_at and locked_by are only updated if there is no other valid lock. The DBMS first checks the where condition an then executes the update and ensures that the row is not changed in between. Hence you can't overwrite an existent lock. – gregor Apr 25 '10 at 17:20
  • Ofcourse you could use `SELECT ... FOR UPDATE` but this is not necessary and it is difficult to implement for the various DBMS, since all this locking mechanisms are different. – gregor Apr 25 '10 at 17:28
  • 2
    FWIW the locking code (lock_exclusively!) now lives in the pluggable backend. E.g. for ActiveRecord this is in https://github.com/collectiveidea/delayed_job_active_record/blob/master/lib/delayed/backend/active_record.rb – rboyd Mar 21 '12 at 18:16
  • Not sure if locking would do better but with replication I get: `[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. The statement is unsafe because it uses a LIMIT clause. This is unsafe because the set of rows included cannot be predicted. Statement: UPDATE \`delayed_jobs\` SET \`delayed_jobs\`.\`locked_at\` = '2014-10-25 02:00:25', \`delayed_jobs\`.\`locked_by\` = 'delayed_job host:www1 pid:15229' WHERE ((run_at <= '2014-10-25 02:00:25' AND (locked_at IS NULL OR locked_at < '2014-10-24 22:00:25') OR [..]` – 2called-chaos Oct 25 '14 at 02:02