2

I have to write a program which checks if a particular directory on my folder has any files (of a specific extension), and if it finds any files, it reads them one by one and loads data from them into a database.

This is the rough algorithm in my mind:

  1. Using an infinite while() loop, continuously keep checking if the directory has any files of that particular extension (e.g. check if the directory has any *.xml files). I can use the PHP glob() function.

  2. If yes, then in a foreach loop, read data from each file and load it into the database.

  3. Once a file's data has been loaded, delete it.

My Question:

I will be constantly checking if there any .xml files in the directory. This means that many times I will get a true (meaning/saying "Yes, there are .xml files in the directory") even for the files whose data is BEING loaded.

So once a file has been found in the directory, I need a check which checks if its data is in the process of being loaded into a database. How do I check that?

The process of data-loading is that I extract useful data from the file into a .csv file and then use LOAD DATA INFILE SQL query to load the data into my MySQL database.

Solace
  • 8,612
  • 22
  • 95
  • 183
  • 1
    Since you need PHP code to determine whether an external process (mysql `LOAD...`) is reading a file, I *think* you're going to need to look at [semaphores](https://en.wikipedia.org/wiki/Semaphore_(programming)#Semaphores_vs._mutexes) or mutexes. [Lockfiles](https://en.wikipedia.org/wiki/File_locking#Lock_files) are a kind of semaphore. – Mike Sherrill 'Cat Recall' Nov 12 '16 at 13:43
  • @MikeSherrill'CatRecall' The file is first converted to `.csv` and then MySQL `LOAD...` is executed. I want to check if the file is BEING CONVERTED TO .csv. The file is converted to .csv file through a SHELL command (the commands we execute in Windows cmd or Linux terminal). – Solace Nov 12 '16 at 14:10
  • 1
    *"I want to check if the file is BEING CONVERTED TO .csv."* Why? If only your PHP code calls a shell script or batch file to convert xml to csv, there's no problem with concurrent access. Is your shell script converting xml to csv as a background job? – Mike Sherrill 'Cat Recall' Nov 12 '16 at 16:46
  • @MikeSherrill'CatRecall' An infinite `while` loop continuously checks if there is a `.xml` file in the directory. Now say, at this very moment, it finds a `.xml` file in the directory, and I start executing the shell command which extracts useful data from this `.xml` file and inserts that into a `.csv` file. (After the data is extracted from .xml and dumped into .csv file, the .xml file is deleted). Now since we are in an infinite while loop (which keeps checking if there is a `.xml` file in the directory), - continued in next comment! – Solace Nov 12 '16 at 19:08
  • @MikeSherrill'CatRecall' -continued from previous comment: say, a nanosecond after this very moment, the shell command is BEING executed and the useful data from the .xml file is BEING extracted; BUT the condition which checks if there is a .xml file in the directory WILL RETURN TRUE. and hence the script which extracts useful data from the .xml file will start again, for the same .xml file (because the .xml file gets deleted only AFTER the useful data is extracted from it. ) – Solace Nov 12 '16 at 19:08
  • 1
    *"An infinite while loop continuously checks..."* No, it doesn't. It runs a series of commands in a never-ending loop. One of those commands should *probably* be invoking a shell script *and waiting for it to finish*, rather than invoking a shell script in the background. – Mike Sherrill 'Cat Recall' Nov 12 '16 at 19:29

1 Answers1

3

One solution is to use inotifywait as suggested in this answer: https://stackoverflow.com/a/6767891/2032943 to watch event and then act on them.

Also if you want to see that the file is already being used by some other command, you can use linux lsof command to check if there is an open handle for the file used by some process:

lsof | grep <filename>

Note that these commands are specific to linux and will not work on windows.

Community
  • 1
  • 1
Jay Rajput
  • 1,813
  • 17
  • 23
  • 1
    Firstly, thank you. Secondly, Doesn't the linux way make it OS dependent? – Solace Nov 12 '16 at 13:35
  • 1
    Yes the linux way makes it OS dependent. So the solution will not work for windows. Let me indicate that in the answer – Jay Rajput Nov 12 '16 at 14:34
  • Checking for a filename won't catch cases when the file is referenced using hard link or the file has been deleted. If you want to be extra sure, check the inode number using `stat $filename`, then `lsof | grep $inode_number` – matt Apr 03 '23 at 11:21