Scenario:
- In my application, there are 3 processes which are copying documents on a shared drive in their respective folders.
As soon as any document is copied on shared drive (by any process), directory watcher (Java) code picks up the document and call the Python script using "Process" and do some processing on the document. code snippet is as follows:
Process pr = Runtime.getRuntime().exec(pythonCommand); // retrieve output from python script BufferedReader bfr = new BufferedReader(new InputStreamReader(pr.getInputStream())); String line = ""; while ((line = bfr.readLine()) != null) { // display each output line from python script logger.info(line); } pr.waitFor();
Currently my code waits till python code execution is completed on the document. Only after that it pick up the next document. Python code takes 30 secs to complete.
- After processing the document, document is moved from the current folder to archive OR error folder.
- Please find below screen shot of the scenario:
What is the problem?
- My code is processing documents in sequential manner and I need to process the document in parallel.
- As Python code takes around 30 seconds, some of the events created by directory watcher are also getting lost.
- If around 400 documents are coming within a short span of time, document processing stops.
What I am looking for?
- Design solution for processing documents in parallel.
- In case of any failure scenario for document processing, pending documents must be processed automatically.
- I tried spring boot schedular as well but still documents are getting processed in sequential manner only.
- Is it possible to call the Python code in parallel as a background process.
Sorry for the long question but I am stuck at this from many days and already looked many similar questions. Thank you!