2

We have a shared folder which contains some files that needs to be processed. We also have 3 UNIX servers which runs a shell script which take and process one file each time. At the end of the script the file it's moved away. The 3 UNIX server doesn't communicate each other, and they are not aware of each other.

In your opinion what is the best way to guarantee that each file will be processed one time, without raising concurrent access issues\error ?

Duncan_McCloud
  • 543
  • 9
  • 24
  • 2
    Find file, rename the files to "filename.lock.servername" as the first action in the script. Ignore all files that end ".lock.servername" in the file find stage – Vorsprung Apr 07 '14 at 13:05

1 Answers1

2

So or so you need some type of a file locking mechanism. Some of the possibilities:

  • You can create a temporary lock file for every files on work. For example, for file name.ext you will need to create a name.ext.lock, just before you start its processing. If this file already exists - also, the creation fails with a "file exists", it means somebody is already working on it, thus you shouldn't do anything with it.
  • Second, you could use advisory locks. Advisory locking doesn't already work on every type of file sharing, and they have only libc-level interface, so you can't use them from shell scripting. I suggest to dig into the manual of the flock libc api call.
  • Third, it were the hardest and it is deeply unix-specific. It were the mandatory lock. Mandatory locking means that the locks are effective even against the processes, who don't know anything from them. You can read more about them here: Mandatory file lock on linux

In your place I did the first, if I can modify the workings of the processing (for example, if I can hook them with a script or even I am developing the processing script). If not, you need probably the third, although it doesn't always work.

peterh
  • 11,875
  • 18
  • 85
  • 108
  • 1
    Thank you very much Peter, especially about advice #2 and #3. I will wait sometime to hear other possible approach, than I will mark your answer as accepted – Duncan_McCloud Apr 07 '14 at 13:31
  • 1
    I think if you use the first method, you should pay attention to the race condition problem. Of course the `flock` function will do this for us, but it is not suitable for shell script. – zhujs Apr 08 '14 at 01:20
  • @zhujs can you explain a little bit more the "race condition" problem ? – Duncan_McCloud Apr 08 '14 at 06:46
  • @zhujs I think i have found it: http://stackoverflow.com/questions/34510/what-is-a-race-condition – Duncan_McCloud Apr 08 '14 at 06:47
  • 1
    @Duncan_McCloud & zhujs Yes, you are right. Of course you should do this locking _only_ with _atomic_ operations! For example, an if not exists something.lck; then create something.lck;fi weren't okay! A very pleasant atomic operation is the mkdir, or maybe the the atomic touch. I started an interesting thread about the case, but I suggest to do it in another question. – peterh Apr 08 '14 at 07:38
  • @PeterHorvath Does the `touch` command can be used as an atomic operation? I can't find an option supporting this operation. – zhujs Apr 08 '14 at 12:12
  • @zhujs The touch tool not. I think of the `open` syscall with the `O_NOEXCL` flag. But in shellscripts I use normally `mkdir`. – peterh Apr 08 '14 at 15:23