105

I'm trying to copy /home/myUser/dir1/ and all its contents (and their contents, etc.) to /home/myuser/dir2/ in python. Furthermore, I want the copy to overwrite everything in dir2/.

It looks like distutils.dir_util.copy_tree might be the right tool for the job, but not sure if there's anything easier/more obvious to use for such a simple task.

If it is the right tool, how do I use it? According to the docs there are 8 parameters that it takes. Do I have to pass all 8 are just src, dst and update, and if so, how (I'm brand new to Python).

If there's something out there that's better, can someone give me an example and point me in the right direction?

starball
  • 20,030
  • 7
  • 43
  • 238
IAmYourFaja
  • 55,468
  • 181
  • 466
  • 756
  • 13
    `os.system("cp -rf /src/dir /dest/dir")` would be pretty easy... – Joran Beasley Oct 02 '12 at 02:46
  • Thanks @JoranBeasley (+1) - however, according to `cp`'s docs, the `-f` arg ("force"): *if an existing destination file cannot be opened, remove it and try again*... this doesn't seem to be the same as "overwrite all". Can you confirm it is the same and that whatever `dir1`'s contents are all get (recrusively) copied to `dir2`'s subtree? Thanks again! – IAmYourFaja Oct 02 '12 at 02:49
  • try it ... it should work fine :) ive never had a problem with it... – Joran Beasley Oct 02 '12 at 02:55
  • 4
    @4herpsand7derpsago: `cp` overwrites files by default. There's a switch that *prevents* it from overwriting files, but not the other way around. – Blender Oct 02 '12 at 03:23
  • 3
    @JoranBeasley Is that method platform independent? – HelloGoodbye Jul 06 '18 at 08:33

6 Answers6

82

Notice:

distutils has been deprecated and will be removed in Python 3.12. Consider looking for other answers at this question if you are looking for a post-3.12 solution.


Original answer:

You can use distutils.dir_util.copy_tree. It works just fine and you don't have to pass every argument, only src and dst are mandatory.

However in your case you can't use a similar tool likeshutil.copytree because it behaves differently: as the destination directory must not exist this function can't be used for overwriting its contents.

If you want to use the cp tool as suggested in the question comments beware that using the subprocess module is currently the recommended way for spawning new processes as you can see in the documentation of the os.system function.

Azhar Khan
  • 3,829
  • 11
  • 26
  • 32
Vicent
  • 5,322
  • 2
  • 28
  • 36
  • 1
    Thanks! Just a note, I needed two imports to get this to work. See http://stackoverflow.com/questions/18908941/unable-to-import-distutils-dir-util-on-windows/33417555#33417555 – kmarsh Oct 29 '15 at 15:15
  • This module some times has conflict with git files. For example, it cannot copy .git/COMMIT_EDITMSG files. Recommend shutil.copytree instead. – Jinhua Wang Aug 10 '17 at 23:50
  • 2
    What is the logical/philosophical difference between `shutil` and `distutils.dir_util`? Is one preferable to the other in general? – Mike Ottum Feb 28 '18 at 18:26
  • 1
    distutils.dir_util.copy_tree doesn't overwrites exiting files – Eli Borodach Mar 20 '18 at 10:43
  • Found the solution, you need to set the preserve_mode=0, otherwise, it doesn't overwrite anything. – Eli Borodach Mar 21 '18 at 15:45
  • 1
    Also, beware that if you call this method twice with the same arguments it will fail if you cleaned the target directory: https://bugs.python.org/issue22132 – JP Illanes Nov 27 '18 at 10:18
  • 6
    Isn't `distutils` deprecated, or is that only in favour of `setuptools` in `setup.py`? – Andreas is moving to Codidact Jun 28 '19 at 22:52
  • distutils.dir_util.copy_tree() works wonders if you want to avoid shutil.copytree's "cannot copy when file exists' issue. – ewokx Oct 07 '20 at 10:06
64

Have a look at the shutil package, especially rmtree and copytree. You can check if a file / path exists with os.paths.exists(<path>).

import shutil
import os

def copy_and_overwrite(from_path, to_path):
    if os.path.exists(to_path):
        shutil.rmtree(to_path)
    shutil.copytree(from_path, to_path)

Vincent was right about copytree not working, if dirs already exist. So distutils is the nicer version. Below is a fixed version of shutil.copytree. It's basically copied 1-1, except the first os.makedirs() put behind an if-else-construct:

import os
from shutil import *
def copytree(src, dst, symlinks=False, ignore=None):
    names = os.listdir(src)
    if ignore is not None:
        ignored_names = ignore(src, names)
    else:
        ignored_names = set()

    if not os.path.isdir(dst): # This one line does the trick
        os.makedirs(dst)
    errors = []
    for name in names:
        if name in ignored_names:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                copytree(srcname, dstname, symlinks, ignore)
            else:
                # Will raise a SpecialFileError for unsupported file types
                copy2(srcname, dstname)
        # catch the Error from the recursive copytree so that we can
        # continue with other files
        except Error, err:
            errors.extend(err.args[0])
        except EnvironmentError, why:
            errors.append((srcname, dstname, str(why)))
    try:
        copystat(src, dst)
    except OSError, why:
        if WindowsError is not None and isinstance(why, WindowsError):
            # Copying file access times may fail on Windows
            pass
        else:
            errors.extend((src, dst, str(why)))
    if errors:
        raise Error, errors
phoenix
  • 7,988
  • 6
  • 39
  • 45
Michael
  • 7,316
  • 1
  • 37
  • 63
  • 3
    Your example doesn't overwrite anything, simply replace one dir with another. Replacing content and overwriting content are different things as far as I know. The question specifically ask for overwriting. – Vicent Oct 02 '12 at 11:11
  • Well, I guess that depends on your definition of overwrite. If you want to have non-duplicate files in the target folder preserved or not. This version is not preserving anything, true. – Michael Oct 02 '12 at 11:19
  • I just had a look into the code of `copytree`. So if you just want this plain overwrite as mentioned by @Vincent, you can simply use `shutil.copytree()` and you're done. Present files are automatically overwritten. – Michael Oct 02 '12 at 11:25
  • 2
    `shutil.copytree` shouldn't work; if it does, that's a bug, because its docs say "The destination directory, named by *dst*, must not already exist". – Fred Foo Oct 02 '12 at 11:40
  • how to use ignore param, can you share an example? – Ashutosh gupta Sep 09 '21 at 19:29
  • This won't merge contents. – sivann Jun 08 '22 at 06:08
51

In Python 3.8 the dirs_exist_ok keyword argument was added to shutil.copytree():

dirs_exist_ok dictates whether to raise an exception in case dst or any missing parent directory already exists.

So, the following will work in recent versions of Python, even if the destination directory already exists:

shutil.copytree(src, dest, dirs_exist_ok=True)  # 3.8+ only!

One major benefit is that it's more flexible than distutils.dir_util.copy_tree(), as it takes additional arguments on files to ignore, etc. (see the documentation). On top of that, the accepted PEP 632 also states that distutils will be deprecated and subsequently removed in future versions of Python 3.

orthocresol
  • 657
  • 8
  • 9
39

Here's a simple solution to recursively overwrite a destination with a source, creating any necessary directories as it goes. This does not handle symlinks, but it would be a simple extension (see answer by @Michael above).

def recursive_overwrite(src, dest, ignore=None):
    if os.path.isdir(src):
        if not os.path.isdir(dest):
            os.makedirs(dest)
        files = os.listdir(src)
        if ignore is not None:
            ignored = ignore(src, files)
        else:
            ignored = set()
        for f in files:
            if f not in ignored:
                recursive_overwrite(os.path.join(src, f), 
                                    os.path.join(dest, f), 
                                    ignore)
    else:
        shutil.copyfile(src, dest)
mgrant
  • 991
  • 7
  • 10
  • Thanks this worked for me. I like the fact that it doesn't remove existing files. I had to add a little piece to make sure that the destination directory exists for files in the `else:` part. – Batandwa Oct 23 '14 at 09:19
  • shouldn't it be `else: shutil.copyfile(src, dest, ignore)`? – Jack James Feb 19 '16 at 14:08
  • This was the only function I found working when I have to do repetitive copying and deleting. – Niko Föhr Feb 01 '17 at 07:30
  • 1
    @JackJames No it shouldn't since `copyfile()` is only called for a single file. When using `ignore` then you pass a folder to `src` (since otherwise if you would ignore the file you are pass to `src` you would not have to call the function at all). When passing a folder, `copyfile()` will be never called for ignored files. – Roi Danton Apr 11 '17 at 09:34
  • I replaced `shutil.copyfile(src, dest)` with `try: copy2( src, dest ) except SameFileError: src.replace( dest )` to copyover the file meta data too. – Sun Bear Oct 07 '19 at 08:11
  • Can you explain what is `if ignore is not None: ignored = ignore(src, files)`? Is ignore a function? But it has not been declared yet... – Sun Bear Oct 07 '19 at 08:18
1

My simple answer.

def get_files_tree(src="src_path"):
    req_files = []
    for r, d, files in os.walk(src):
        for file in files:
            src_file = os.path.join(r, file)
            src_file = src_file.replace('\\', '/')
            if src_file.endswith('.db'):
                continue
            req_files.append(src_file)

    return req_files
def copy_tree_force(src_path="",dest_path=""):
    """
    make sure that all the paths has correct slash characters.
    """
    for cf in get_files_tree(src=src_path):
        df= cf.replace(src_path, dest_path)
        if not os.path.exists(os.path.dirname(df)):
            os.makedirs(os.path.dirname(df))
        shutil.copy2(cf, df)
0

What about SYNC.

How to synchronize two folders using python script

from dirsync import sync
source_path = '/Give/Source/Folder/Here'
target_path = '/Give/Target/Folder/Here'

sync(source_path, target_path, 'sync') #for syncing one way
sync(target_path, source_path, 'sync') #for syncing the opposite way

quine9997
  • 685
  • 7
  • 13