-2

I have a python script that processes a file. Currently I manually start the python script when the file is generated. What I want is that my python script is constantly running and checking for the existence of the file. There are a lot of examples how to achieve that: Python while loop to check if file exists Check and wait until a file exists to read it Python (Watchdog) - Waiting for file to be created correctly

but my file is about ~10GB, so when the file is generated but not finished with the generation, my python will already see it existence and will process the file while it is still being generated.

Any tips?

What I probably could do is, when the file exists, go into another loop which checks its size/checksum and if the size/checksum doesn't change, it means the file is completely generated.

Sinan
  • 83
  • 3
  • 9
  • 2
    How is the file generated? By another program? If the program closes after outputting the file you could just do `my_programm && my_python_file.py`. If your program does not finish after creating the file, you could try to check if the file exists, and if it does see if no more processes are looking at that file, see also: https://stackoverflow.com/a/44615315/2305545 – NOhs May 09 '19 at 15:20
  • @NOhs the file is being downloaded with wget. I cannot do wget file && mypython.py, since the python script should be constantly running. The python script will be a container which is constantly active. Thank you for the link. let me read and test it. – Sinan May 09 '19 at 15:27
  • 1
    Do you have control over the wget process? i.e. if you're downloading `foo.zip`, can you modify your wget script to create a file named `foo.zip.FINISHED` when it's done? Then your python script could just look for any file ending in `.FINISHED`, and know that the real file is fully downloaded. – John Gordon May 09 '19 at 15:39
  • 1
    @Sinan you should then wrap wget inside of your python script and call it from there (or use urllib or requests or something like that); that way you can keep your script active all the time and control the download and even trigger ocassional redownload if needed. – icwebndev May 09 '19 at 15:53
  • @JohnGordon I have control over the wget process, your suggestion is a good one. Thanks! – Sinan May 10 '19 at 08:08

1 Answers1

0

You can get the size of a file using os.stat and compare that against the expected size. For more information, see this answer:

How to check file size in python?