-1

i will answer any questions i can

Basically I have a list of 70 words that I am looking for in over 500 files, and I need to replace them with new words and numbers.

ie... find "hello" and replace with "hello 233.4" but 70 words/numbers and 500+ files.

I found an informative post here, but I have been reading about sys.argv, re's, searches, replaces, etc.. etc.. etc.. I can not understand this bit of code. I have been "calling" (i think) it from "cmd" window on windows 7 with scriptname.py "-i " and "-o"...

if someone could put example input search list path "c:/input/file/path/searchlist.txt" and the example path for the file to be searched "c:/search/this/file/searchme.txt" in their correct positions please! (I will try and get it to repeat through every file in a folder on my own and highlight or bold the replacements on my own.)

I have tried many combinations... I could go over every modification ive made, and could type for days/pages/days/pages... each day/page getting dumber and dumber every time!

Thanks... OR IF YOU KNOW OF A DIFFERENT WAY, PLEASE SUGGEST ADVISE.

here is the link to the original post:

Use Python to search one .txt file for a list of words or phrases (and show the context)

here is the code from the original post:

import re
import sys

def main():
  if len(sys.argv) != 3:
    print("Usage: %s fileofstufftofind filetofinditin" % sys.argv[0])
    sys.exit(1)

  with open(sys.argv[1]) as f:
    patterns = [r'\b%s\b' % re.escape(s.strip()) for s in f]
  there = re.compile('|'.join(patterns))

  with open(sys.argv[2]) as f:
    for i, s in enumerate(f):
      if there.search(s):
        print("Line %s: %r" % (i, s))

main()
Community
  • 1
  • 1
user2779846
  • 13
  • 1
  • 5

2 Answers2

1

The code that you posted above is probably to complex for what you need for your assignment. Perhaps something more simple like the following is easier to understand:

# example variables
word_mapping = [['horse', 'donkey'], ['left', 'right']]
filename = 'C:/search/this/file/searchme.txt'

# load the text from the file with 'r' for "reading"
file = open(filename, 'r')
text = file.read()
file.close()

# replace words in the text
for find_word, replacement in word_mapping:
    text = text.replace(find_word, replacement)

# save the modified text to the file, 'w' for "writing"
file = open(filename, 'w')
file.write(text)
file.close()

For loading your list of words to replace, you could simply do something like:

words_path = 'C:/input/file/path/searchlist.txt'
with open(words_path) as f:
    word_mapping = [line.split() for line in f]

str.split() splits a string on whitespace (spaces, tabs) by default, but you can split on other characters or even "words". If you have for example a comma separated file you use line.split(',') and it splits on comma's.


As an explanation to that code you posted above.. There are a couple of separate things happening, so lets break it down in a couple of pieces.

if len(sys.argv) != 3:
    print("Usage: %s fileofstufftofind filetofinditin" % sys.argv[0])
    sys.exit(1)

This particular script receives the paths to the wordslist and the target file as command line arguments, so you can run this script as python script_name.py wordslist_file target_file. In other words you don't hardcode the file paths in the script, but let the user provide them at run time.

This first part of the code checks how many command line parameters have been passed to the script, by checking the length of sys.argv, which is a list containing command line parameters as strings. When the number of command line parameter is not equal to 3, an error message is printed. The first (or zeroth) argument is the filename of the script, so that's why sys.argv[0] is printed as part of the error message.

with open(sys.argv[1]) as f:
    patterns = [r'\b%s\b' % re.escape(s.strip()) for s in f]
    there = re.compile('|'.join(patterns))

This opens a file with words (with filename equal to sys.argv[1]) and compiles regular expression objects for them. Regular expressions give you more control over what words are matched, but it has its own "mini-language" which can be quite confusing if you don't have experience with it. Note that this script only finds words and doesn't replace them, so the file with words it uses contains only one "word" per line.

with open(sys.argv[2]) as f:
    for i, s in enumerate(f):
        if there.search(s):
            print("Line %s: %r" % (i, s))

This opens the target file (filename in the second command line parameter sys.argv[2] and loops over the lines in that file. If a line contains a word from the wordslist the whole line is printed.

  • The script is working just fine. I am still trying to figure out the original code... so i can use a file with my list of words. Thank you for the solution, if you have any clarification on the first bit of code please? – user2779846 Nov 02 '13 at 00:21
  • @user2779846, I've added some addition explanation, is this what you wanted to know? –  Nov 02 '13 at 13:21
  • excellent explanation. I will be sure to post my final code. this makes sense. – user2779846 Nov 09 '13 at 05:05
0

Maybe can try this... Find all files in a directory with extension .txt in Python

Put all 500 files in the same directory and process from there.

Community
  • 1
  • 1
  • Thank you for the link! This will definitely help "walk through" the file folder. I just still need to figure how to search them against my list. – user2779846 Oct 26 '13 at 06:22