8

I am routinely missing the last few kb of a file I am trying to copy using shutil copyfile.

I did some research and do see someone asking about something similar here: python shutil copy function missing last few lines

But I am using copyfile, which DOES seem to use a with statement...

with open(src, 'rb') as fsrc:
    with open(dst, 'wb') as fdst:
        copyfileobj(fsrc, fdst)

So I am perplexed that more users aren't having this issue, if indeed it is some sort of buffering issue - I would think it'd be more well known.

I am calling copyfile very simply, don't think I could possibly be doing something wrong, essentially doing it the standard way I think:

copyfile(target_file_name,dest_file_name) 

Yet I am missing the last 4kb or so of the file eachtime.

I have also not touched the copyfile function which gets called in shutil which is...

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

So I am at a loss, but I suppose I am about to learn something about flushing, buffering, or the with statement, or ... Help! thanks


to Anand: Anand, I avoided mentioning that stuff bc it's my sense that it's not the problem, but since you asked... executive summary is that I am grabbing a file from an FTP, checking if the file is different from the last time I saved a copy, if so, downloading the file and saving a copy. It's circuitous spaghetti code and was written when I was a truly pure utilitarian novice of a coder I guess. It looks like:

for filename in ftp.nlst(filematch):
    target_file_name = os.path.basename(filename)
    with open(target_file_name ,'wb') as fhandle:
    try:
        ftp.retrbinary('RETR %s' % filename, fhandle.write)
        the_files.append(target_file_name)
        mtime = modification_date(target_file_name)
        mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16]    + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
        sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
        sorted_xml_files.sort(key=os.path.getmtime)
        last_file = sorted_xml_files[-1]
        file_is_the_same = filecmp.cmp(target_file_name, last_file)
        if not file_is_the_same:
            print 'File changed!'
            copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml') 
        else:
            print 'File '+ last_file +' hasn\'t changed, doin nothin'
            continue
Community
  • 1
  • 1
10mjg
  • 573
  • 1
  • 6
  • 18
  • Can you show more of your code, how are you creating the `target_file_name` , as well as how are you creating the target_file itself? – Anand S Kumar Jul 21 '15 at 18:38
  • is there one specific file it always happens with? what os are you on and what python? can you post a file that it always does this with? are you trying to write to a network drive or something? – Joran Beasley Jul 21 '15 at 18:38
  • Anand, I responded to you in the post, wasn't sure how else to do it bc too many characters for a comment. – 10mjg Jul 21 '15 at 18:49
  • Joran, it is happening almost every time. Maybe one out of every few hundred files work correctly. OS is windows 7 pro, python is 2.7. yes, writing to a network drive. the files are XML files. basically i am routinely missing the last say 10-50 lines of the XMLs. – 10mjg Jul 21 '15 at 18:50
  • @10mjg Are you creating those files that are being copied? In the same code? Or do they already exist in the system? – Anand S Kumar Jul 21 '15 at 19:05
  • @AnandSKumar i am not creating the xml files , just downloading them from an FTP from a third party. not sure if that answers your question - thx – 10mjg Jul 21 '15 at 19:22
  • 2
    @10mjg Try doing - `fhandle.flush()` right after - `ftp.retrbinary('RETR %s' % filename, fhandle.write)` . – Anand S Kumar Jul 21 '15 at 19:24
  • ZOMG ZOMG ZOMG that seemed to work. Anand, genius! Let me test a little further... – 10mjg Jul 21 '15 at 19:40

3 Answers3

4

The issue here would most probably be that , when executing the line -

ftp.retrbinary('RETR %s' % filename, fhandle.write)

This is using the fhandle.write() function to write the data from the ftp server to the file (with name - target_file_name) , but by the time you are calling -shutil.copyfile - the buffer for fhandle has not completely flushed, so you are missing out on some data when copying the file.

To make sure that this does not occur, you can either move the copyfile logic out of the with block for fhandle .

Or you can call fhandle.flush() to flush the buffer , before copying the file .

I believe it would be better to close the file (move the logic out of the with block). Example -

for filename in ftp.nlst(filematch):
    target_file_name = os.path.basename(filename)
    with open(target_file_name ,'wb') as fhandle:
        ftp.retrbinary('RETR %s' % filename, fhandle.write)
    the_files.append(target_file_name)
    mtime = modification_date(target_file_name)
    mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16]    + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
    sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
    sorted_xml_files.sort(key=os.path.getmtime)
    last_file = sorted_xml_files[-1]
    file_is_the_same = filecmp.cmp(target_file_name, last_file)
    if not file_is_the_same:
        print 'File changed!'
        copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml') 
    else:
        print 'File '+ last_file +' hasn\'t changed, doin nothin'
        continue
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • Thank you Anand this fix definitely was the first and most direct fix for the symptom I had. I will also consider nsilent22's solution which seems like it might be more fundamentally sound. Any thoughts welcome. – 10mjg Jul 21 '15 at 19:51
  • 1
    nslient22's is the same solution, to close the file before we send it to `copyfile()` function , by moving the copyfile (and all logic that is not dependent on the file handle) outside the `with` block. – Anand S Kumar Jul 21 '15 at 19:54
  • Ah you are right, I focused on the 'flush' idea - but you also specified "To make sure that this does not occur, you can either move the copyfile logic out of the with block for fhandle .". Thank you so much. – 10mjg Jul 21 '15 at 19:58
  • (i focused on the flush idea bc i first saw your comment, before seeing this answer. thanks again). – 10mjg Jul 21 '15 at 20:08
  • Oh ok, sorry that was just to test whether that is the exact issue (I had a hunch, just wanted to confirm). – Anand S Kumar Jul 21 '15 at 20:10
2

You are trying to copy a file that was not closed. That's why buffers were not flushed. Move the copyfileobj out of the with block, to allow fhandle beeing closed.

Do:

with open(target_file_name ,'wb') as fhandle:
    ftp.retrbinary('RETR %s' % filename, fhandle.write)

# and here the rest of your code
# so fhandle is closed, and file is stored completely on the disk
nsilent22
  • 2,763
  • 10
  • 14
  • I will take a look at this right after I finish testing Anand's flushing solution which I caught earlier and also seemed to fix it. I am not smart enough to know which solution is better/etc. Yours seems to be a more fundamental solution. Thank you and any other insight is appreciated. – 10mjg Jul 21 '15 at 19:48
  • This also works after a little bit of testing. Wow, so obvious in hindsight. Thank you. – 10mjg Jul 21 '15 at 19:57
1

This looks like there is a better way to do nested withs:

with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
        copyfileobj(fsrc, fdst)

I'd try something more like this. I'm far from an expert, hopefully someone more knowledgeable can lend some insight. My best thought is that the inner with closes before the outer one.

Community
  • 1
  • 1
Will
  • 4,299
  • 5
  • 32
  • 50
  • I would be happy to try/test that. Perhaps I am naive but isn't it odd though that ... shutil copyfile could be used so widely and yet be suboptimal / need that fix? – 10mjg Jul 21 '15 at 18:53
  • Yes, it is odd. I'll honestly be a little surprised if this fixes it, but `with` blocks always mess me up. I use `open()` and `close()` like a noob :) – Will Jul 21 '15 at 18:55