Copy file if it doesn't already exist

Question

I'm fairly new to python, and I'm wondering how I can copy and paste a file from one location to another with first checking to see if the copied file exists in the destination folder?

The reason I want to check if the file exists is this script will be put on a task scheduler and run on a set schedule, so I don't want to be copying everything every single time, just those files that don't exist in the destination folder?

Thanks in advance!

FWIW, I think there's nearly always a better way to handle file creation/deletion/use than `os.path.exists`. In most cases there's another module that handles it more elegantly (as I used in my answer using glob to compare two lists) and if there's not then try/catch does a better job of preventing the race condition. I can't think of any script I've written that uses `os.path.exists` that I couldn't re-write to use `glob` with better functionality. — Adam Smith, Dec 16 '13 at 17:07
@adsmith unless you have some security concerns, there's no reason not to use `os.path.exist` — yuvi, Dec 16 '13 at 17:18

score 11 · Answer 1 · edited Jul 26 '19 at 10:01

11

import glob
import os.path
import shutil

SRC_DIR = #your source directory
TARG_DIR = #your target directory

GLOB_PARMS = "*" #maybe "*.pdf" ?

for file in glob.glob(os.path.join(SRC_DIR, GLOB_PARMS)):
    if file not in glob.glob(os.path.join(TARG_DIR, GLOB_PARMS)):
        shutil.copy(file,TARG_DIR)
    else:
        print("{} exists in {}".format(
            file,os.path.join(os.path.split(TARG_DIR)[-2:])))
        # This is just a print command that outputs to console that the
        # file was already in directory

I'm assuming you're trying to send a whole folder over with this command, otherwise glob uses pretty easy to understand interface. glob.glob("*.txt") will grab all the files with a .txt extension, etc. Shouldn't be too hard to tweak this to exactly what you want.

It's important to note that file copy usually involves a race condition. Basically, time passes between checking to see if the file isn't in TARG_DIR (if file not in glob.glob(TARG_DIR)) and actually copying it there (shutil.copy(file,TARG_DIR)). In that amount of time, the file could end up there, which will cause shutil.copy to overwrite the file. This may not be your intended functionality, in which case you should look into a different method. I don't know of a good one without some research that will try to copy a file but return an exception if that file already exists.

Try/Except blocks, as another answer mentioned, may be useful here as well if it's possible you won't have write access to the directory when your script runs. shutil.copy will return an IOError exception if that is the case. I believe glob will just return an empty list if you don't have read access to your source directory (which in turn will feed nothing through the "For" loop, so you won't have any errors there).

EDIT: Apparently glob doesn't work the way I remembered it did, sorry about that.

edited Jul 26 '19 at 10:01

Robert

1,357
15
26

answered Dec 16 '13 at 16:41

Adam Smith

52,157
12
73
112

1

In this case, I don't want to copy an entire folder, I want to copy contents within the folder, PDF files. – mpboyle Dec 16 '13 at 16:55
1

That's what this does -- sorry I was unclear. `glob.glob` returns a list of the contents of a folder based on some parameters you give it (`glob.glob(TARG_DIR+"\\*.pdf")` would return a list of files in TARG_DIR with the extension PDF, for instance). – Adam Smith Dec 16 '13 at 17:02
here is what i've tried and an getting an error: `import glob import os.path import shutil SRC_DIR = "C:\\Users\\mboyle\\Documents\\Source" TARG_DIR = "C:\\Users\\mboyle\\Documents\\Target" for file in glob.glob(SRC_DIR): if file not in glob.glob(TARG_DIR): shutil.copy(file,TARG_DIR) else: print "exists"` – mpboyle Dec 16 '13 at 17:28
Line 10, which is the `shutil.copy(file,TARG_DIR)`. Do I need to specify a file type (.pdf)? – mpboyle Dec 16 '13 at 17:30
No. Can you paste the stack trace for me? – Adam Smith Dec 16 '13 at 17:33
`Traceback (most recent call last): File "C:\Users\mboyle\Desktop\CopyPaste.py", line 10, in shutil.copy(file,TARG_DIR) File "C:\Python27\ArcGIS10.1\lib\shutil.py", line 116, in copy copyfile(src, dst) File "C:\Python27\ArcGIS10.1\lib\shutil.py", line 81, in copyfile with open(src, 'rb') as fsrc: IOError: [Errno 13] Permission denied: 'C:\\Users\\mboyle\\Documents\\Source'` – mpboyle Dec 16 '13 at 17:34
For whatever reason you don't have permissions to that folder. It might help to put `print(file)` above `shutil.copy(file,TARG_DIR)` to help track it down. – Adam Smith Dec 16 '13 at 17:36
i checked and have full control over both the source and target folders – mpboyle Dec 16 '13 at 17:40
did you add that `print` line? what does it print? – Adam Smith Dec 16 '13 at 17:42
`C:\Users\mboyle\Documents\Source Traceback (most recent call last): File "C:\Users\mboyle\Desktop\CopyPaste.py", line 11, in shutil.copy(file,TARG_DIR) File "C:\Python27\ArcGIS10.1\lib\shutil.py", line 116, in copy copyfile(src, dst) File "C:\Python27\ArcGIS10.1\lib\shutil.py", line 81, in copyfile with open(src, 'rb') as fsrc: IOError: [Errno 13] Permission denied: 'C:\\Users\\mboyle\\Documents\\Source' – mpboyle Dec 16 '13 at 17:54
Change your print line to `print("File is currently %s" % (file))` and copy/paste the entire output. I don't need the stack trace again -- it's going to be the same. I'm thinking Python isn't pulling the file you want it to. – Adam Smith Dec 16 '13 at 17:57
i have tried this and it works. The only problem is i don't believe it will run through and check to see if a the same file currently exists or not, it just copies everything. `import os import shutil dir_src = "C:\\Users\\mboyle\\Documents\\Source" dir_dst = "C:\\Users\\mboyle\\Documents\\Target" for file in os.listdir(dir_src): src_file = os.path.join(dir_src, file) dst_file = os.path.join(dir_dst, file) shutil.copy(src_file, dst_file)` – mpboyle Dec 16 '13 at 18:01
You're right, that code won't check for file existence. I also found the bug in my code -- `glob.glob(this_is_a_dir)` doesn't return the contents of the directory, oops! `glob.glob(root\path\path2\targdir)` returns `['root\\path\\path2\\targdir','root\\path\\path2\\targdirectory']` which isn't what you're looking for. I've updated my answer and it should now run without bugs (at least it did on my system) – Adam Smith Dec 16 '13 at 18:14

jwarner112 · Answer 2 · 2013-12-16T17:02:34.643

In Python, it can often be seen that you will hit errors, known as exceptions, when running code. For this reason, the try/catch has been deployed.

Here is a snippet of code I use in my day-to-day that clears files from a directory, or skips if they do not exist.

def DeleteFile(Path_):
    """Deletes saved project AND its corresponding "files" folder."""
    try: #deletes the folder
        os.remove(Path_)
    except OSError:
        pass
    try: #deletes the file, using some fancy python operations to arrive at the filename
        shutil.rmtree(os.path.join(os.path.dirname(Path_),os.path.splitext(os.path.basename(Path_))[0])+"_files", True)
    except OSError:
        pass

This is a classic example of checking for if a file exists. Instead of deleting inside your try statement, you could have try to copy the file. If it fails, it moves on to pass, which just skips the try/catch block.

Take note, try/catch can be used to catch any exception, or it can be used to catch specific ones. I have OSError scrawled in but read through them to be sure that's the one you want. If you have a specific error in a catch and the system gives back the wrong kind of error, your try/catch will not work the way you want it to. So, be sure. IN best case, be general.

Happy coding!

EDIT: It is worth noting that this try/catch system is a very Pythonic way to go about things. try/catch is very easy and popular, but your situation may call for something different.

EDIT: I'm not sure if it's worth noting this, but I realize my answer doesn't directly tell you how to check if a file exists. Instead, it assumes that it does not and proceeds with the operation anyway. If you hit a problem (i.e., it exists and you need to overwrite), you can make it so that it automatically skips over the entire thing and goes onto the next one. Again, this is only one of many methods for accomplishing the same task.

Upper-case letters in function and variable names are considered bad style in most Python projects: https://www.python.org/dev/peps/pep-0008/ — karlson, Dec 30 '15 at 13:54

Copy file if it doesn't already exist

2 Answers2