Find files in a directory containing desired string in Python

Question

I'm trying to find a string in files contained within a directory. I have a string like banana that I know that exists in a few of the files.

import os
import sys

user_input = input("What is the name of you directory?")
directory = os.listdir(user_input)
searchString = input("What word are you trying to find?")

for fname in directory: # change directory as needed
    if searchString in fname:
        f = open(fname,'r')
        print('found string in file %s') %fname
    else:
        print('string not found')

When the program runs, it just outputs string not found for every file. There are three files that contain the word banana, so the program isn't working as it should. Why isn't it finding the string in the files?

why don't you add a `print fname` on the for loop to see what you are getting — The6thSense, Dec 30 '15 at 13:17
alright, pasted it. i just added a print fname on the for loop, and i got this output: What is the name of your directory?example What word are you trying to find?banana 1.txt string not found 2.txt string not found 3.txt string not found — rigning, Dec 30 '15 at 13:26

Kenly · Accepted Answer · 2015-12-31T07:22:23.147

13

You are trying to search for string in filename, use open(filename, 'r').read():

import os

user_input = input('What is the name of your directory')
directory = os.listdir(user_input)

searchstring = input('What word are you trying to find?')

for fname in directory:
    if os.path.isfile(user_input + os.sep + fname):
        # Full path
        f = open(user_input + os.sep + fname, 'r')

        if searchstring in f.read():
            print('found string in file %s' % fname)
        else:
            print('string not found')
        f.close()

We use user_input + os.sep + fname to get full path.
os.listdir gives files and directories names, so we use os.path.isfile to check for files.

edited Dec 31 '15 at 07:22

answered Dec 30 '15 at 13:26

Kenly

24,317
7
44
60

Ohh i see i was searching for string in filename. Thanks zetysz, that makes sense. But i am getting an error: File "C:\Users\XX\Desktop\python exercises\practice.py", line 12, in f = open(fname,'r') FileNotFoundError: [Errno 2] No such file or directory: '1.txt' – rigning Dec 30 '15 at 13:37
but it does. the directory path is desktop\python exercises\example\1.txt – rigning Dec 30 '15 at 13:43
i'm not sure. for the user_input part when i ran the program, i typed 'example' and then for the searchstring, the string i was looking for. – rigning Dec 30 '15 at 13:45
a word in the file content. – rigning Dec 30 '15 at 13:46
Again that makes sense. I ran the program now with your edit. But it just asks me for two inputs, then it outputs nothing else. – rigning Dec 30 '15 at 14:03
@rigning First it asks for directory name then it asks for a serachstring. – Kenly Dec 30 '15 at 14:09
oh it works now. i changed the line to print('found string in file %s' % fname) previously the % was outside the brackets. can i just ask, what does if os.path.isfile(user_input + os.sep + fname): do? What does (user_input + os.sep + fname) do? Thanks. – rigning Dec 30 '15 at 14:55

RSale · Answer 2 · 2021-12-29T12:30:30.547

Here is another version using the Path module from pathlib instead of os.

def search_in_file(path,searchstring):
    with open(path, 'r') as file:
        if searchstring in file.read():
            print(f'  found string in file {path.name}')
        else:
            print('string not found')

from pathlib import Path

user_input = input('What is the name of your directory')

searchstring = input('What word are you trying to find?')

dir_content = sorted(Path(user_input).iterdir())

for path in dir_content: 
    
    if not path.is_dir():
    
        search_in_file(path, searchstring)

score 1 · Answer 3 · answered Apr 03 '22 at 10:40

This is my solution for the problem. It comes with the feature of also checking in sub-directories, as well as being able to handle multiple file types. It is also quite easy to add support for other ones. The downside is of course that it's quite chunky code. But let me know what you think.

import os
import docx2txt
from pptx import Presentation
import pdfplumber

def findFiles(strings, dir, subDirs, fileContent, fileExtensions):
    # Finds all the files in 'dir' that contain one string from    'strings'. 
    # Additional parameters:
    # 'subDirs': True/False : Look in sub-directories of your folder
    # 'fileContent': True/False :Also look for the strings in the file     content of every file
    # 'fileExtensions': True/False : Look for a specific file extension -> 'fileContent' is ignored
    filesInDir = []
    foundFiles = []
    filesFound = 0

    if not subDirs:
        for filename in os.listdir(dir):
            if os.path.isfile(os.path.join(dir, filename).replace("\\", "/")):
                filesInDir.append(os.path.join(dir, filename).replace("\\", "/"))
    else:
        for root, subdirs, files in os.walk(dir):
            for f in files:
                if not os.path.isdir(os.path.join(root, f).replace("\\", "/")):
                    filesInDir.append(os.path.join(root, f).replace("\\", "/"))
    print(filesInDir)
    # Find files that contain the keyword
    if filesInDir:
        for file in filesInDir:
            print("Current file: "+file)
            # Define what is to be searched in
            filename, extension = os.path.splitext(file)
            if fileExtensions:
                fileText = extension
            else:
                fileText = os.path.basename(filename).lower()
                if fileContent:
                    fileText +=  getFileContent(file).lower()
            # Check for translations
            for string in strings:
                print(string)
                if string in fileText:
                    foundFiles.append(file)
                    filesFound += 1
                    break
    return foundFiles

def getFileContent(filename):
    '''Returns the content of a file of a supported type (list: supportedTypes)'''
    if filename.partition(".")[2] in supportedTypes:
        if filename.endswith(".pdf"):
            content = ""
            with pdfplumber.open(filename) as pdf:
                for x in range(0, len(pdf.pages)):
                    page = pdf.pages[x]
                    content = content + page.extract_text()
            return content
        elif filename.endswith(".txt"):
            with open(filename, 'r') as f:
                content = ""
                lines = f.readlines()
                for x in lines:
                    content = content + x
            f.close()
            return content
        elif filename.endswith(".docx"):
            content = docx2txt.process(filename)
            return content
        elif filename.endswith(".pptx"):
            content = ""
           prs = Presentation(filename)
            for slide in prs.slides:
                for shape in slide.shapes:
                    if hasattr(shape, "text"):
                        content = content+shape.text
            return content
    else:
        return ""

supportedTypes = ["txt", "docx", "pdf", "pptx"]
print(findFiles(strings=["buch"], dir="C:/Users/User/Desktop/",  subDirs=True, fileContent=True, fileExtensions=False))

score 0 · Answer 4 · answered Sep 10 '20 at 20:41

Here is the most simple answer I can give you. You don't need the colors, they are just cool and you may find that you can learn more than one thing in my code :)

import os
from time import sleep

#The colours of the things
class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

# Ask the user to enter string to search
search_path = input("Enter directory path to search : ")
file_type = input("File Type : ")
search_str = input("Enter the search string : ")

# Append a directory separator if not already present
if not (search_path.endswith("/") or search_path.endswith("\\") ): 
        search_path = search_path + "/"
                                                          
# If path does not exist, set search path to current directory
if not os.path.exists(search_path):
        search_path ="."

# Repeat for each file in the directory  
for fname in os.listdir(path=search_path):

   # Apply file type filter   
   if fname.endswith(file_type):

        # Open file for reading
        fo = open(search_path + fname, 'r')

        # Read the first line from the file
        line = fo.read()

        # Initialize counter for line number
        line_no = 1

        # Loop until EOF
        if line != '' :
                # Search for string in line
                index = line.find(search_str)
                if ( index != -1) :
                    print(bcolors.OKGREEN + '[+]' + bcolors.ENDC + ' ', fname, sep="")
                    print('      ')
                    sleep(0.01)
                else:
                    print(bcolors.FAIL + '[-]' + bcolors.ENDC + ' ',  fname, ' ', 'does not contain', ' ', search_str, sep="")
                    print("       ")
                    sleep(0.01)
                line = fo.readline()  

                # Increment line counter
                line_no += 1
        # Close the files
        fo.close()

That is it!

score 0 · Answer 5 · answered May 18 '21 at 03:08

I was trying with the following code for this kind of problem, please have a look.

import os,sys

search_path=input("Put the directory here:")

search_str = input("Enter your string")

# Append a directory separator if not already present
if not (search_path.endswith("/") or search_path.endswith("\\") ): 
        search_path = search_path + "/"
                                                          
# If path does not exist, set search path to current directory
if not os.path.exists(search_path):
        search_path ="."

# Repeat for each file in the directory  
for fname in os.listdir(path=search_path):

   # Apply file type filter   
   if fname.endswith(file_type):

        # Open file for reading
        fo = open(search_path + fname)

        # Read the first line from the file
        line = fo.readline()

        # Initialize counter for line number
        line_no = 1

        # Loop until EOF
        while line != '' :
                # Search for string in line
                index = line.find(search_str)
                if ( index != -1) :
                    print(fname, "[", line_no, ",", index, "] ", line, sep="")

                # Read next line
                line = fo.readline()  

                # Increment line counter
                line_no += 1
        # Close the files
        fo.close()

Find files in a directory containing desired string in Python

5 Answers5

Linked