0

I am creating a simple python script to find and replace strings inside of files which are also inside of sub-folders and so on. This calls for recursion.

The following script finds and replaces a string for another string found in every file inside of every folder of the target parent folder.

I've found this post on here suggesting the use of fileinput module in order to avoid reading entire files into memory which could slow things down...

...simplify the text replacement in a file without requiring to read the whole file in memory...

Credits @jfs

Python is very dynamic and honestly, I get lost on the many different ways to accomplish the same task.

How can I integrate this approach into my script below?

import subprocess, os, fnmatch

if os.name == 'nt':
    def clear_console():
        subprocess.call("cls", shell=True)
        return
else:
    def clear_console():
        subprocess.call("clear", shell=True)
        return

# Globals
menuChoice = 0
searchCounter = 0

# Recursive find/replace with file extension argument.
def findReplace(directory, find, replace, fileExtension):

    global searchCounter

    #For all paths, sub-directories & files in (directory)...
    for path, dirs, files in os.walk(os.path.abspath(directory)):
        #For each file found with (FileExtension)...
        for filename in fnmatch.filter(files, fileExtension):
            #Construct the target file path...
            filepath = os.path.join(path, filename)
            #Open file correspondent to target filepath.
            with open(filepath) as f:
                # Read it into memory.
                s = f.read()
            # Find and replace all occurrances of (find).
            s = s.replace(find, replace)
            # Write these new changes to the target file path.
            with open(filepath, "w") as f:
                f.write(s)
                # increment search counter by one.
                searchCounter += 1

    # Report final status.
    print ('  Files Searched: ' + str(searchCounter))
    print ('')
    print ('  Search Status : Complete')
    print ('')
    input ('  Press any key to exit...')

def mainMenu():
    global menuChoice
    global searchCounter

    # range lowest index is 1 so range of 6 is 1 through 7.
    while int(menuChoice) not in range(1,1):

        clear_console()
        print ('')
        print ('  frx v1.0 - Menu')
        print ('')
        print ('  A. Select target file type extension.')
        print ('  B. Enter target directory name. eg -> target_directory/target_subfolder')
        print ('  C. Enter string to Find.')
        print ('  D. Enter string to Replace.')
        print ('')
        print ('  Menu')
        print ('')

        menuChoice = input('''
      1. All TXT  files. (*.txt )

      Enter Option: ''')
        print ('')

        # Format as int
        menuChoice = int(menuChoice)

        if menuChoice == 1:

            fextension = '*.txt'

            # Set directory name
            tdirectory = input('  Target directory name? ')
            tdirectory = str(tdirectory)
            print ('')

            # Set string to Find
            fstring = input('  String to find? (Ctrl + V) ')
            fstring = str(fstring)
            print ('')

            # Set string to Replace With
            rstring = input('  Replace with string? (Ctrl + V) ')
            rstring = str(rstring)
            print ('')

            # Report initial status
            print ('  Searching for occurrences of ' + fstring)
            print ('  Please wait...')
            print ('')

            # Call findReplace function
            findReplace('./' + tdirectory, fstring, rstring, fextension)

# Initialize program
mainMenu()

# Action Sample...
#findReplace("in this dir", "find string 1", "replace with string 2", "of this file extension")

# Confirm.
#print("done.")
suchislife
  • 4,251
  • 10
  • 47
  • 78
  • I would guess to do the `for line in FileInput(files, inplace=True):line.replace(text, replacement)` part, and for `files` use `fnmatch.filter(files, fileExtension)` – Peter Nov 22 '18 at 15:46

1 Answers1

0

Your check that the inputs are '.txt' files is good; it relieves you of needing to worry about passing 'rb' or 'wb' to open().

You say you don't want to allocate N bytes for an N-byte file, for fear that occasionally N may be quite large. Better to limit the memory allocation to size of longest text line rather than size of biggest file. Let's break out a helper function. Delete / replace these lines:

            #Open file correspondent to target filepath.
            with open(filepath) as f:
                # Read it into memory.
                s = f.read()
            # Find and replace all occurrances of (find).
            s = s.replace(find, replace)
            # Write these new changes to the target file path.
            with open(filepath, "w") as f:
                f.write(s)
                # increment search counter by one.
                searchCounter += 1

with a call to the helper function and then a bump of the counter:

            update(filepath, find, replace)
            searchCounter += 1

and then define the helper:

def update(filepath, find, replace, temp_fspec='temp'):
    assert temp_fspec != filepath, filepath
    with open(filepath) as fin:
        with open(temp_fspec) as fout:
            for line in fin:
                fout.write(line.replace(find, replace))
    os.rename(temp_fspec, filepath)  # overwrites filepath

Using fileinput is not relevant, since that would catenate lines from many inputs into a single output stream, and your requirement is to associate each output with its own input. The for line in idiom is what matters here, and it works the same in fileinput as in the suggested update() helper.

Consider putting unusual characters in temp_fspec to reduce the chance of collision, or perhaps make it a fully qualified path in the same filesystem but above the affected subtree so it's guaranteed to never collide.

This version should typically take a little longer to run, especially for lengthy files full of short lines. Maximum memory footprint for this version should be much smaller, if max file size >> max line length. If very long lines are a concern, then a binary chunking approach would be more appropriate, finessing the case where find might span a chunk boundary. We needn't handle that case in the current code if we assume that find does not contain '\n' newlines.

We can simplify two versions of your clear screen routine down to one by phrasing it this way:

def clear_console():
    clear = 'cls' if os.name == 'nt' else 'clear'
    subprocess.call(clear, shell=True)
    return
J_H
  • 17,926
  • 4
  • 24
  • 44