0

currently I have in one system implemented solution for my problem. But I think there exists better solution, but I have not found it yet. I hope someone could find couliton or hint in me some direciton.

I have system for spliting different kind of jobs. Each work is in PostgreSQL database in table "jobs", where one row is one job (can be done multiple times). Each job has an attribute how many times it can be done (how many users can work on it). So my table jobs works like this:

ID_JOB NAME DONE HAS_TO_BE_DONE
1 Puzzle Solving 2  3
2 Washing Dishes 1 3

When users comes and ask for job:

  1. User gets id of job when job meets condition DONE < HAS_TO_BE_DONE
  2. For given job value DONE is incremented ( +1 )

It works, but when more users are working simultaneously I need to everytime LOCK (type: SERIALIZABLE) database (even for reading).

If I would not used database locking probled could occur: User #1 comes - he is in step 1, (he knows Id of job he got), in same time (a few miliseconds later) comes User #2 and will ask for job and will be in step 1 and will get same job as User #1 because #User 1 did not managed step 2 yet (and didn't incremente DONE). This is reason why I have to completely lock database.

Does anyone know how to improve this method? Currently it works, but in moment when 100 users are working simultaneously database locking is causing slowing application. Problems are these two steps when in first step I will find job for user and then in second step I increment value.

If anyone would come with better solution I would be happy. Thank you

Jim Macaulay
  • 4,709
  • 4
  • 28
  • 53
LostyBoj
  • 43
  • 7

1 Answers1

0

The simplest approach is to check and update in a single statement.

update jobs
set done = done + 1
where done < has_to_be_done
  and id = ?

If it updates a row, there was a job to be done. If it doesn't, there was no job to be done.


But there's a problem. You only have two statuses: job needs to be done, job is done. What if 5 users are all working on the same job, but it only needs to be done twice? That's wasteful. You need three statuses:

  • queued
  • in progress
  • done

You could add these as columns to the table and increment/decrement them.

create table jobs (
  id bigserial primary key,
  name text not null,
  queued integer not null default 0 check (queued >= 0),
  in_progress integer not null default 0 check (in_progress >= 0),
  done integer not null default 0 check (done >= 0)
);

Queue a job.

update jobs
set queued = queued + 1
where id = ?

Start a job. If it updates a column, you got a job. If it doesn't, there are no jobs in the queue.

update jobs
set queued = queued - 1,
    in_progress = in_progress + 1
where queued > 0
  and id = ?

Finish a job.

update jobs
set in_progress = in_progress - 1,
    done = done + 1
where in_progress > 0
  and id = ?

Or, for even finer control, split it into two tables: jobs and a job queue.

create table jobs (
  id bigserial primary key
  name text not null
);

create table job_queue (
   id bigserial primary key
   job_id bigint not null references jobs(id) on delete cascade
);

jobs
id    name
1     Puzzle Solving
2     Washing Dishes

job_queue
id job_id
3  1
5  2
6  2

Queue a job by inserting it.

insert into job_queue (job_id) values (?)

Finish a job by deleting one entry. It's not necessary to first check. If a row is deleted, you finished a job.

delete from jobs_queue
where id = (
  select id from jobs_queue
  where job_id = ?
  limit 1
)

We can combine the two. Flags and a job_queue table. We can store more information like who is doing the work, when it was queued, and so on.

create table job_queue (
   id bigserial primary key
   job_id bigint not null references jobs(id) on delete cascade,
   status text not null default 'queued',
   user_id bigint references users(id) on delete set null,
   created_at timestamp not null default current_timestamp,
   updated_at timestamp not null default current_timestamp
);

Queue a job.

insert into job_queue (job_id) values (?)

Start a job. Again, no check necessary. Instead, check if a row was updated.

update job_queue
set status = 'in progress', user_id = ?, updated_at = current_timestamp
where id = (
  select id
  from job_queue
  where status = 'queued'
  limit 1
)
returning id

Finish a job. Use the ID returned from the update if you have it.

update job_queue
set status = 'done', updated_at = current_timestamp
where id = ?

Or find a queued job you have in progress and update that.

update job_queue
set status = 'done', updated_at = current_timestamp
where id = (
  select id
  from job_queue
  where user_id = ?
    and job_id = ?
    and status = 'in progress'
  limit 1
)
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Interesting but I do not think this will solve problem if two user will come in same time. If USER1 get job with ID 1 (first step - first SQL query), then would come USER2 and got job with ID 1 too. Then as a second step USER 1 deletes entry from `jobs_todo`. Then USER2 tries to delete etnry from `jobs_todo`. – LostyBoj May 31 '20 at 17:53
  • Problem is in first step I will get ID and THEN in second step I increment (or I would create flag that someone is doing this job). And if more users working in same time then in time between first and second step anyone else could come and ask for job. – LostyBoj May 31 '20 at 17:56
  • @LostyBoj I've expanded the answer to make things more clear, and to give more options. – Schwern May 31 '20 at 19:11
  • Thank you. Is there any way ho to get ID from this query? ` update jobs set done = done + 1 where done < has_to_be_done ` If yes, it would solve problem. If not then in all your ways you are doing it in two sql queries and therefor I need to lock database and I am in same state as I currently am. – LostyBoj Jun 02 '20 at 07:17
  • @LostyBoj Yes, add `returning id`. But shouldn't you already know the id of the job you're saying is done? – Schwern Jun 02 '20 at 07:52
  • Thats the problem. I need to get ID from database and in same time update that job has been taken right now. I dont need to track how is working on what (already done in different parts). Main problem is to get ID from database and same time update value in table in ONE STEP. your code is great but you are assuming I already have ID of job. I do not. User will come and ask for job - then will get ID and value in database muset be incremented to avoid giving the same job to some one else. When it is all made during two steps it may cause problems if table is not locked. Btw thank you. – LostyBoj Jun 02 '20 at 11:27