Check every Minute if there was an *.odb file generated or not. If yes --> Get Data

Question

I would like to check every minute if there was a file like "RESULTS.ODB" generated and if this file is bigger than 1.5 Gigabyte there starts another subprocess to get the Data from this file. How can i make sure that the file isn´t in progress to be written and everything is included?

I hope you know what i mean. Any ideas how to handle that?

Thank you very much. :)

Try to open it for writing (+w) on your own. If the file is locked someone else still has write permissions on the file. — Datz, Apr 30 '18 at 09:49

Hannu · Answer 1 · 2018-04-30T11:02:44.263

If you have no control over the writing process, then you are at some point bound to fail somewhere.

If you do have control over the writer, a simple way to "lock" files is to create a symlink. If your symlink creation fails, there is already a write in progress. If it succeeds, you just acquired the "lock".

But if you do not have any control over writing and creation of the file, there will be trouble. You can try the approach as outlined here: Ensuring that my program is not doing a concurrent file write
This will read timestamps of the file and "guess" from them if writing has completed or not. This is more reliable than checking the file size, as you could end up with a file over your size threshold but writing still in progress.

In this case the problem would be the writer starting to write before you have read the file in its entirety. Now your reader would fail when the file it was reading disappeared half way through.

If you are on a Unix platform, you have no control over write and you absolutely need to do this, I would do something like this:

Check if file exists and if it does, if the "last written" timestamp is "old enough" for me to assume the file is there
Rename the file to a different name
Check the renamed file that it still matches your criteria
Get data from the renamed file

Nevertheless, this will eventually fail and you will lose an update, as there is no way to make this atomic. Renaming will remove the problem of overwriting the file before you have read it, but if the writer decides to start writing between 1 and 2, you not only will receive an incomplete file but you might also break the writer if it does not like the file disappearing half way through.

I would rather try to find a way to somehow chain the actions together. Either your writer triggering the read process or adding a locking mechanism. Writing 1.5GB of data is not instantaneous and eventually the unexpected will happen.

Or if you definitely cannot do anything like that, could you ensure for example that your writer writes maximum once in N minutes or so? If you could be sure it never writes twice within a 5 minute window, you would wait in your reader until the file is 3 minutes old and then rename it and read the renamed file. You could also check if you could prevent the writer from overwriting. If you can do this, then you can safely process the file in your reader when it is "old enough" and has not changed in whatever grace period you decide to give it, and when you have read it, you will delete the file allowing the next update to appear.

Without knowing more about your environment and processes involved this is the best I can come up with. But there is no universal solution to this problem. It needs a workaround that is tailored to your particular environment.

Check every Minute if there was an *.odb file generated or not. If yes --> Get Data

1 Answers1