0

I am running multiple background worker processes in my Postgres database. I am doing this using Pg_Cron extension. I cannot unfortunately use Pg_Timetables, as suggested by another user here.

Thus, I have 5 dependent "Jobs" that need 1 other independent Procedure/Function to execute and complete before they can start. I originally having my Cron jobs simply check every 30minutes or-so some "job_log" table I created to see if the independent Job completed (i.e. if yes, execute Procedure, if not, return out of Procedure and check at next Cron interval)

However, I believe I could simplify the way I am triggering/orchestrating all these Jobs/Procedures greatly if I utilize pg_sleep and start all the Jobs at one -time (so no more checking every 30minutes). I would be running these Jobs in the night time concurrently so I believe it shouldn't effect my actual traffic that much.

i.e.

WHILE some_variable != some_condition LOOP
   PERFORM pg_sleep(1);
   some_variable := some_value; -- update variable here
END LOOP;

My question is

  1. Would starting all these Jobs at one time (i.e. Setting a concrete time in the Cron expression e.g. 15 18 * * *), and utilizing pg_sleep be bad practice/inefficient as I would be idling 5 background workers while 1 Job completes. The 1 job these are dependent on could take any amount of time to finish i.e. 15 min, 30 min, 1hr (should be < 1 hr though).

Or is better to simply just use a Cron expression to check every 5min or so if the main/independent Job is done, so my other Jobs that are dependent can then run?

ennth
  • 1,698
  • 5
  • 31
  • 63
  • I don't know that function in detail, the documentation just warns "Make sure that your session does not hold more locks than necessary when calling pg_sleep or its variants. Otherwise other sessions might have to wait for your sleeping process, slowing down the entire system.". I would suggest an other solution based on a dedicated table used to synchronize the functions : at the end of the independent function, you insert 5 rows in that table, each one triggers one of the dependent functions. At the end of each of the dependent function, you delete the corresponding row in the table. – Edouard Oct 25 '21 at 21:05
  • Have you considered `LISTEN` / `NOTIFY`? See: https://stackoverflow.com/a/21636434/939860, https://stackoverflow.com/a/13261411/939860 – Erwin Brandstetter Oct 26 '21 at 03:17

1 Answers1

0

Running two schedulers, one of them home-built, seems more complex than just running one scheduler that does 2 (or 6, 1+5, however you count it) different things. If your goal is to make things simpler, does your proposal really achieve that?

I wouldn't worry about 5 backends sleeping on pg_sleep at the same time. But you might worry about them holding back the xid horizon while they do so, which would make vacuuming and HOT pruning less effective. But if you already have one long-running task in one snapshot (the thing they are waiting for) then more of them aren't going to make the matter worse.

jjanes
  • 37,812
  • 5
  • 27
  • 34