9

I'm trying to write a script to take video files (ranging from several MB to several GB) written to a shared folder on a Windows server.

Ideally, the script will run on a Linux machine watching the Windows shared folder at an interval of something like every 15-120 seconds, and upload any files that have fully finished writing to the shared folder to an FTP site.

I haven't been able to determine any criteria that allows me to know for certain whether a file has been fully written to the share. It seems like Windows reserves a spot on the share for the entire size of the file (so the file size does not grow incrementally), and the modified date seems to be the time the file started writing, but it is not incremented as the file continues to grow. LSOF and fuser do not seem to be aware of the file, and even Samba tools don't seem to indicate it's locked, but I'm not sure if that's because I haven't mounted with the correct options. I've tried things like trying to open the file or rename it, and the best I've been able to come up with is a "Text File Busy" error code, but this seems to cause major delays in file copying. Naively uploading the file without checking to see if it has finished copying not only does not throw any kind of error, but actually seems to upload nul or random bytes from the allocated space to the FTP resulting in a totally corrupt file (if the network writing process is slower than the FTP) .

I have zero control over the writing process. It will take place on dozens of machines and consist pretty much exclusively of Windows OS file copies to a network share.

I can control the share options on the Windows server, and I have full control over the Linux box. Is there some method of checking locks on a Windows CIFS share that would allow me to be sure that the file has completely finished writing before I try to upload it via FTP? Or is the only possible solution to have the Linux server locally own the share?

Edit

The tldr, I'm really looking for the equivalent of something like 'lsof' that works for a cifs mounted share. I don't care how low level, though it would be ideal if it was something I could call from Python. I can't move the share or rename the files before they arrive.

Paul
  • 189
  • 1
  • 6
  • 1
    can you take md5sum on the files (before and after uploading?) They must match if they are equal. – Usman Saleem Jan 14 '13 at 06:17
  • Is it an option to copy the file under a temporary name, perhaps hidden, and rename it after uploading is finished? This would prevent the files from being used while it is being written. – Pat Jan 14 '13 at 07:57
  • @UsmanSaleem I've tried to take the md5sum or even simpler checksum, but when I try to compute the sum on a copying file, the copying process seems to freeze or dramatically slow down, and I often get the "text file busy" message from Linux. It's a potential solution but I'm worried that it might very negatively impact the performance of the share. – Paul Jan 15 '13 at 02:44
  • @Pat The files are being copied to a network client using drag and drop in Windows Explorer. I have zero control over what the files can be named or whether or not they are hidden. – Paul Jan 15 '13 at 02:45

2 Answers2

1

I had this problem before, i'm not sure my way is the best way and it's most deffinatley a hacky fix, but i used a sleep interval and file size check, (i would expect the file to have grown if it was being written to...)

In my case i wanted to know that not only was the file not being written to but also that the windows share was not being written to...

my code is;

while [ "$(ls -la "$REMOTE_CSV_DIR"; sleep 15)" != "$(ls -la "$REMOTE_CSV_DIR")" ]; do
    echo "File writing seems to be ocuring, waiting for files to finish copying..."
done

(ls -la includes file sizes in bits...)

mikejonesey
  • 190
  • 1
  • 5
  • 1
    Thanks. File growing seems like a reasonable way to check, but as I stated above, when monitoring a Windows share, I don't see the file grow at all. Even if the file is 5GB, the size is instantly the entire 5GB, and does not change over time. Is it possible I'm missing some mount option that changes this behavior? – Paul Jan 29 '13 at 01:07
  • ls -lah would list 5G, but ls -la would list in bytes :), this should change as it's being uploaded even if to a temporary file first. – mikejonesey Feb 05 '13 at 11:00
  • 1
    It doesn't. It's not a bytes vs gigabytes issue. I literally mean that the instant a file appears in the directory, the size attribute reflects the entire size of the file, even though it has not finished copying. Bits, bytes, gigabytes, or otherwise. – Paul Feb 05 '13 at 21:54
1

What about this?:

Change the windows share to point to an actual Linux directory reserved for the purpose. Then, with simple Linux scripts, you can readily determine if any files there have any writers. Once there is a file not being written to, copy it to the windows folder—if that is where it needs to be.

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • Thanks, but if I could change the share, I would simply have it point to a Linux directory and monitor that. I'm really looking for a way to monitor a Windows share from Linux. – Paul Jan 29 '13 at 01:10
  • @Paul: Sorry, I guess I misunderstood what you meant by "I can control the share options on the Windows server". – wallyk Jan 29 '13 at 01:47
  • Sorry if I was unclear. What I meant is, if there was a share setting in the Windows server that would make the file size growing visible from Linux, or some kind of locking mechanism I could query, I could make that change. I can't change the address of the share because there are many users and a firewall punch involved, and the server is used for other purposes, so I can't replace it. – Paul Jan 29 '13 at 02:49