0

I have a requirement where scheduler will trigger the task in the fixed delay of 2 minutes. Task picks all the files from the directory(e.g abc) and distributes them to multiple threads for processing. Where each thread does the following things, 1. Reads the data from the particular file(e.g file1.csv). 2. Appends some more data after validation and writes the resultant data to one more file(e.g file1-updated.csv) in the updated(e.g xyz) directory. 3. Deletes the input file file1.csv from directory abc.

Files would be pushed to abc directory dynamically from other server when the end user does some action. When scheduler triggers every 2 minutes it picks up all the files and distributes them to threads as i have explained above. Now the question is - Lets say there were 2 files file1.csv and file2.csv and scheduler picked them and distributed to threads in first trigger. Now file3.csv has been pushed to abc directory and scheduler triggered after 2 minutes again. Now, only file3.csv should be distributed by scheduler to threads not file1.csv and file2.csv since they have been already picked up in previous trigger and they are under processing. I have to ensure that, only new files are distributed to threads for processing.

Can i use file locking mechanism - 1. Lock the file(Using java file locking mechanism) once it's been given to thread. 2. When scheduler triggers second time and distributes the file to thread, check if the file is in locked status if not then only process further else just come out of the thread. 3. Release the lock and delete the file from abc folder once the file process is completed. is there any better way than the file locking mechanism to achieve this? Any help appreciated.

bharath
  • 27
  • 1
  • 9
  • The underlying assumption that two threads processing two files concurrently is faster than one thread processing both files sequentially is probably false. You need to investigate that first. – user207421 Mar 30 '18 at 08:53

2 Answers2

1

You could rename files (for example add a suffix .lock) when selecting them to flag the files as 'in progress'.

The next time task is executed, it will filter out those flagged files.

Now, you could have a concurrency issue, if two tasks are flagging files at the same time (let say the fixed delay is very short). In this case, you should use a thread-safe component to flag the files in progress.

Camille Vienot
  • 727
  • 8
  • 6
1

One simple solution would be to have the task (that task which picks up the files and distributes them to multiple threads) maintain a Set of all the files it has picked up and currently in progress. The next time it picks up files, it can check in this Set and process only the new ones after adding the new ones in the Set. The catch is, the threads which process the files will have to remove from this Set once they are done with the file. You will have to use synchronized blocks whenever you manipulate this Set.

Amudhan
  • 696
  • 8
  • 18
  • Thanks Amudhan! Solution looks good. But i also wanted to check if i can achieve this through file locking mechanism. I have updated the question. Please let me know your thoughts. – bharath Mar 30 '18 at 09:25
  • @bharath, I don't have much knowledge on FileLock, but the javadoc says this "File locks are held on behalf of the entire Java virtual machine. They are not suitable for controlling access to a file by multiple threads within the same virtual machine". You can see in this question that FileLock was suggested since the usecase is different where two different JVMs want to operate on a file exclusively. https://stackoverflow.com/questions/128038/how-can-i-lock-a-file-using-java-if-possible So, I think FileLock is not the suitable solution for your scenario. – Amudhan Mar 30 '18 at 09:41