-3

Context

I have a list of filenames (many tens of thousands) that I would like to locate within a directory. For all files that are located they must be copied to a single output folder.

Using Python, what would be, in your opinion, the most efficient* strategy for doing this? I'm not looking for a solution, but good strategy to get started.

To break this down:

  • How the list of filenames should be stored and the method of searching if a filename is in the list?
  • How to go through an entire directory, folder by folder, and consider each file in tern?
  • How to copy the file (least processing time)?

Caveat

*efficient in the sense that the script execution should not 'hog' system's resources. Other more important applications maybe running concurrently.

Many thanks!

  • 2
    Try *something* and then come back and ask a specific question if you have a specific problem. – tzaman Sep 12 '14 at 19:33
  • what is the question? – Xstian Sep 12 '14 at 19:34
  • filenames in a text file, line by line. os.path.walk to visit the directories. shutil.copy2 to do the deed. – tdelaney Sep 12 '14 at 19:41
  • How I would I start... Bullet 1) Store the filenames in a list and loop through the list trying to locate a filename. Bullet 2) I'd use os.walk to recursively go through the directory, folder by folder, to consider each file. Bullet 3) I'd use shutil.copyfile to copy a file after each file was located. Without much knowledge of the language, this is what I'd do. What I'm looking for is a more efficient strategy. – WangJingHK Sep 12 '14 at 19:49
  • @WangJingHK: Premature optimization is the root of all evil. There's nothing wrong with what you describe except the filenames should be a set instead of a list for fast lookup. Even so, it seems to me that your time will be spent waiting on the operating system to make the copy. – Steven Rumbalski Sep 12 '14 at 19:54

2 Answers2

1
import os
import shutil

filenames_i_want = set() # fill this with the filenames you want
dest_dir = 'whatever'
src_dir = 'whatever'

for (dirpath, dirnames, filenames) in os.walk(src_dir):
    for fname in filenames:
        if fname in filenames_i_want:
            shutil.copy(os.path.join(dirpath, fname), dest_dir)

If this proves too slow use a profiler to figure out the slow parts and optimize from there.

If you find that shutil.copy is slow, refer to "Python copy larger file too slow".

Community
  • 1
  • 1
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
0

I think this would work (Using Python 2.7, since you didn't mention a Python version):

import os, shutil, sys

_files = []

dir = sys.argv[1]
targetDir = sys.argv[2]
endings = sys.argv[3:]

for root, dirs, files in os.walk(dir) :
    for ending in endings :
        if file.endswith(ending) :
            shutil.copy(os.path.join(root, file), os.path.join(targetDir, file)
            _files.append(file)
print _files

A good thing to note is that you have to call it like this:

python copyFiles.py /User/YourName/Documents/ /User/YourName/Desktop/ .txt
andrew
  • 448
  • 7
  • 13