0

In Python 2.7, I want to find and count the files that have a certain list of strings in their file name.

List of files:

  • Passport_Mike.pdf
  • David-Passport.pd
  • Iain ID Card.pdf
  • CopyPassport Michael.pdf
  • Driving License John.pdf

I want to count all files that have 'Passport'or 'ID' in them.

Currently I have found a method where I split the file name into different words based on delimiters ( _-/'). My files cannot always be found because files cannot always be delimited such as 'CopyPassport Michael' because it has no corresponding delimeter that splits 'Passport' from 'Copy'.

My code is based on this answer given in another question. For this code I use collections.Counter()

Here is my code:

from collections import Counter

listOfFiles = [Passport_Mike.pdf, David-Passport.pdf, Iain ID Card.pdf, CopyPassport Michael.pdf, Driving License John.pdf]
searrchTermsList = ["Passport", ÏD']

def fileSplit(string, delimiters):
    delimiters = tuple(delimiters)
    stack = [string,]

    for delimiter in delimiters:
        for i, substring in enumerate(stack):
            substack = substring.split(delimiter)
            stack.pop(i)
            for j, _substring in enumerate(substack):
                stack.insert(i+j, _substring)
    return stack
#This is a complicated split function but this method makes the files split into parts in my next function. Other split methods didn't work for me.

def searchTermsCount(listOfFiles, searchTermsList):
            counts = Counter()              
            for myFile in listOfFiles:
                myFileSplit = fileSplit(myFile,('_',' ','-','.'))
                counts.update(word.upper() for word in myFileSplit)
            myCount = 0
            for word in searchTermsList:
                myCount +=counts[word]
            print "Count files:", myCount

What is a Python 2.7 method to count files that have a list of strings within their file name without using delimiters?

Community
  • 1
  • 1
Jelle
  • 206
  • 4
  • 15

2 Answers2

1

Try this:

listOfFiles = ['Passport_Mike.pdf', 'David-Passport.pdf', 'Iain ID Card.pdf', 'CopyPassport Michael.pdf', 'Driving License John.pdf']
searrchTermsList = ["Passport", 'ID']
relevantfiles = [filename for filename in listOfFiles if any(searchterm in filename for searchterm in searrchTermsList)]
print(relevantfiles)

Output:

['Passport_Mike.pdf', 'David-Passport.pdf', 'Iain ID Card.pdf', 'CopyPassport Michael.pdf']
zondo
  • 19,901
  • 8
  • 44
  • 83
0
listOfFiles = ['Passport_Mike.pdf', 'David-Passport.pdf', 'Iain ID Card.pdf', 'CopyPassport Michael.pdf', 'Driving License John.pdf']
searrchTermsList = ['Passport', 'ÏD']

filenamesWithTerms = [] 
for filename in listOfFiles:
   for term in searrchTermsList:
        if term in filename:
            filenamesWithTerms.append(filename) 
            break
print filenamesWithTerms
>>['Passport_Mike.pdf', 'David-Passport.pdf', 'Iain ID Card.pdf', 'CopyPassport Michael.pdf']
Forge
  • 6,538
  • 6
  • 44
  • 64