1

How do I handle "race condition" between instances of script that is scheduled to run every minute, performing following tasks for every file in directory:

  1. Connect to SQL database and check last element (filename) in table
  2. Create several files (multiple folders) with the next available filename
  3. Insert to SQL new record with the filename and created files' information

Because process runs every 1 minute, it's possible that 2 instances overlap and work on same files. I can prevent that by file locking and skipping already opened file, however the issue persists with:

  • Checking next available filename in database (2 processes want to use the same filename)
  • Creating files with this filename

Process A takes inputA.jpg and finds next available as filename image_01.

Process B takes inputB.jpg and finds next available as filename image_01.

And so the chaos begins...

Unfortunately, I can't insert any placeholder record in SQL table to show that the next filename is being processed.

Pseudo-code of the loop:

foreach ($file)
{
   $name = findFileNameInSql($file)
   $path1 = createFile($name, $settings1);
   $path2 = createFile($name, $settings2);
   $path3 = createFile($name, $settings3);
   addToSql($file, $name, $path1, $path2, $path3)
}

The actual code is a bit more complicated, including file modifications and transactional insert to 2 SQL tables. In case of createFile() failure the application is rolling back all previously created files. It obviously creates issue when one instance of app is creating file "abc" and second instance has error that file "abc" already exists.

EDIT :

Sure, limiting script to have only one instance could be solution, but I was hoping to find a way to run them in parallel. If there's no way to do it, we can close this as duplicate.

yosh
  • 3,245
  • 7
  • 55
  • 84
  • http://stackoverflow.com/a/27434074/223226 – mpapec Jan 09 '15 at 14:29
  • @Сухой27 That's actually pretty easy solution. I was aiming for something that wouldn't prevent 2nd instance from starting if the 1st instance takes longer than 60 seconds, but this might be enough. Thanks. – yosh Jan 09 '15 at 14:51
  • 2
    Can always `flock` a lockfile, and compare mtimes. – Sobrique Jan 09 '15 at 15:06
  • possible duplicate of [How to prevent a Perl script from running more than once in parallel](http://stackoverflow.com/questions/27433252/how-to-prevent-a-perl-script-from-running-more-than-once-in-parallel) – pilcrow Jan 10 '15 at 15:53

1 Answers1

0

You need to make the code which returns the next available filename from the database atomic in the database, so that the database can't return the same filename twice. This is really a database problem rather than a perl problem, per se.

You don't say which database you're using, but there are several ways to do it. A naive and brutish way to do it in MySQL is for the perl script to perform a LOCK TABLE table WRITE on the table with the filenames while it calculates a new one and does its work. Once the table is updated with the new filename, you can release the lock. TABLE LOCKS don't play nicely with transactions though.

Or you could do something rather more elegant, like implement a stored procedure with appropriate locking within the database itself to return the new filename.

Or use an AUTOINCREMENT column, so that each time you add something to the table you get a new number (and hence a new filename).

This can all get quite complicated though; if you have multiple transactions simultaneously, how the database resolves those is usually a configurable thing, so I can't tell you what will happen.

Given that it sounds as though your code is principally reorganising data on disk, there's not much advantage to having multiple jobs running at the same time; this code is probably I/O bound anyway. In which case it's much simpler just to make the code changes others have suggested to run only one copy at once.

Tim Cutts
  • 99
  • 1