0

I have several scripts. Each of them does some computation and it is completely independent from the others. Once these computations are done, they will be saved to disk and a record updated.

The record is maintained by an instance of a class, which saves itself to disks. I would like to have a single record instance used in multiple scripts (for example, record_manager = RecordManager(file_on_disk). And then record_manager.update(...) ); but I can't do this right now, because when updating the record there may be concurrent write accesses to the same file on disk, leading to data loss. So I have a separate record manager for every script, and then I merge the records manually later.

What is the easiest way to have a single instance used in all the scripts that solves the concurrent write access problem?

I am using macOS (High sierra) and linux (Ubuntu 16.04).

Thanks!

Ant
  • 5,151
  • 2
  • 26
  • 43
  • have you considered running only one instance of the object in all of your scripts and using process communication to tell it what to write? – AntiMatterDynamite Jan 09 '18 at 09:47
  • im pretty sure the answer to this may depend on your OS so provide that please – sometimesiwritecode Jan 09 '18 at 09:48
  • It is very common to have multiple log's or whatever you want to call them and merge them, when they are all done. If they are independent, why bother at all? – user1767754 Jan 09 '18 at 09:55
  • @AntiMatterDynamite Can you elaborate on that? What do you mean? – Ant Jan 09 '18 at 10:01
  • @majorcoder Linux and MacOS. I've updated the question – Ant Jan 09 '18 at 10:01
  • @user1767754 Their execution is independent, but conceptually they're related. – Ant Jan 09 '18 at 10:02
  • @Ant I think to give you better advice you would have to come with more concrete examples. We do similar stuff (Task-Que) where we send a bunch of semi-dependent tasks as graph's to our compute cloud and based on some rule set's they are being either executed independently or waiting. As long you manage the `locks` and `waits` you are fine. – user1767754 Jan 09 '18 at 10:06
  • im not sure about mac but in linux you can have a file locked and use it as an access mutex to itself, as long as all your scripts respect the file lock it can be a pretty simple solution – AntiMatterDynamite Jan 09 '18 at 10:28
  • @AntiMatterDynamite Ah, this could actually be a very simple solution. Each script acquires the lock, writes, then release the lock. If the file is locked, wait. Easy! What would you suggest to use? Something like https://docs.python.org/3/library/fcntl.html#module-fcntl ? I think it is available on all OS.. or at least no restriction is specified on the docs. Thank you for your answer anyhow! – Ant Jan 09 '18 at 11:21

1 Answers1

0

To build a custom solution to this you will probably need to write a short new queuing module. This queuing module will have write access to the file(s) alone and be passed write actions from the existing modules in your code.

The queue logic and logic should be a pretty straightforward queue architecture.

There may also be libraries that exist in python to handle this problem that would avoid you writing your own queue class.

Finally, it is possible that this whole thing will be/could be handled in some way by your OS, independent of python.

sometimesiwritecode
  • 2,993
  • 7
  • 31
  • 69
  • Thanks for your answer! So this queuing module would need to run as a separate process, which all of my scripts interact with, right? I guess I can try to do it if it is the only option, but I would like to avoid the added complexity if possible – Ant Jan 09 '18 at 10:04
  • here is an example of a similar custom module: https://stackoverflow.com/questions/6524635/writing-to-a-file-with-multiprocessing i understand the desire to avoid this complexity, there must be a library/built in support for handling this as it seems like a common problem. – sometimesiwritecode Jan 09 '18 at 10:09
  • Thanks, that seems a nice solution. What do you think about a simple lock on the file though? – Ant Jan 09 '18 at 11:49
  • I'm not sure but I think your Mac OS will impose a lock on a file that is being written to by default. This would mean that any second write that is attempted while the first one is underway would simply be rejected though and never implemented as it is not a queue, it is a simple lock. That could be wrong, but I believe I read that regarding the Mac OS earlier. – sometimesiwritecode Jan 09 '18 at 11:50
  • Interesting.. but I don't need mac to behave like a queue, right? I acquire the lock, write, release the lock. If file is already locked, wait. As long as every script respect this convention, it should be fine. And every script will, since they only interact with the files through the record_manager class, which is the only one then that needs to implement this locking procedure – Ant Jan 09 '18 at 11:59
  • I'm not sure what you mean- what you just described is a queue imo, specifically because of the statement "if file is already locked, wait". That is not the default behavior of the is locking as I understand it, instead the default locking has no queue and thus it would be "if the file is already locked, throw away this update" instead of "if file is already Locked, wait." I'm pretty stoned so I hope that makes sense – sometimesiwritecode Jan 09 '18 at 12:05
  • Ahah sure, it makes sense :D I don't want to get too technical about behavior of queues and locks, but the point is that this way I just need to change a couple of lines in the saving function. If I implement a queue like the one that you linked, I need to write the code *and* I need to start all script at the same time with the same "main script" that implements that queue. I can't start one script now, then stop it, lunch another couple, then stop them; the lock system instead allows all of this with less added complexity. Do you agree? – Ant Jan 09 '18 at 12:12
  • 1
    Yes! Maybe you could add a short sleep command in a while loop in the save command that checks if the file is locked/being edited and if it is just have that thread of save sleep and check again in half a second and so on – sometimesiwritecode Jan 09 '18 at 19:04