1

I have a large number of .txt files named in the combination of "cb" + number (like cb10, cb13), and I need to filter them out from a source folder that contains all the files named in "cb + number", including the target files.

The numbers in the target file names are all random, so I have to list all the file names.

import fnmatch
import os
import shutil
os.chdir('/Users/college_board_selection')
os.getcwd()
source = '/Users/college_board_selection'
dest = '/Users/seperated_files'
files = os.listdir(source)

for f in os.listdir('.'):
    names = ['cb10.txt','cb11.txt']
    if names in f:
        shutil.move(f,dest)
user3483203
  • 50,081
  • 9
  • 65
  • 94
gzhang7
  • 31
  • 6

1 Answers1

0

if names in f: isn't going to work as f is a filename, not a list. Maybe you want if f in names:

But you don't need to scan a whole directory for this, just loop on the files you're targetting, it they exist:

for f in ['cb10.txt','cb11.txt']:
    if os.path.exists(f):
        shutil.move(f,dest)

If you have a lot of cbxxx.txt files, maybe an alternative would be to compute the intersection of this list with the result of os.listdir using a set (for faster lookup than a list, worth if there are a lot of elements):

for f in {'cb10.txt','cb11.txt'}.intersection(os.listdir(".")):
   shutil.move(f,dest)

On Linux, with a lot of "cb" files, this would be faster because listdir doesn't perform a fstat, whereas os.path.exists does.

EDIT: if the files have the same prefix/suffix, you can build the lookup set with a set comprehension to avoid tedious copy/paste:

s = {'cb{}.txt'.format(i) for i in ('10','11')}
for f in s.intersection(os.listdir(".")):

or for the first alternative:

for p in ['10','11']:
    f = "cb{}.txt".format(p)
    if os.path.exists(f):
        shutil.move(f,dest)

EDIT: if all cb*.txt files must be moved, then you can use glob.glob("cb*.txt"). I won't elaborate, the linked "duplicate target" answer explains it better.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Since all the file names have the same cb_.txt format, do you know how I can just leave the numbers in the for f in ['10', '11']? – gzhang7 May 02 '18 at 19:53
  • no but you can build the list from a set comprehension. Let me edit. – Jean-François Fabre May 02 '18 at 20:09
  • ... or you use the `glob` module: `files = glob.glob('/Users/college_board_selection/cb*.txt')` – Oliver Baumann May 02 '18 at 20:21
  • @OliverBaumann now that OP has revealed that there's a pattern, why not, but it will scan the whole directory and you'll still have to keep the cb files with relevant numbers, not all of them, so the benefit is low, even negative. but re-reading the question you seem to be right... – Jean-François Fabre May 02 '18 at 20:23