0

I have a function which iterates over a text file, matches words to keys in a dictionary, and changes these words to the keys values:

def replace_operators(text):
    operators = {'order': '"order"'}
    f = open(text, 'r').read()

    for k, v in operators.items():
        cleaned = re.sub(r"\b%s\b" % k, v, f)
        f = open(text, 'w')
        f.truncate(0)
        f.close()
        text_file = open(text, 'w')
        text_file.write(cleaned)
        text_file.close()

This works fine, however when I add another key to the dictionary, I receive:

TypeError: expected string or bytes-like object

I've tried the solution of replacing f with str(f) in the cleaned line (suggested by this answer), however this only writes the following line to my outfile:

<_io.TextIOWrapper "name"='path/of/outfile' mode='w' encoding='cp1252'>

Does anyone know how I can add more keys without getting this kind of error?

Laurie
  • 1,189
  • 1
  • 12
  • 28
  • 3
    after your first time through the `for k, v in ...` loop, `f` is the closed file object, not the result of the initial read. – Adam Smith Nov 06 '18 at 21:08
  • also if you open file in write mode continually, you will clear all the previously stored content of file on each iteration – Gahan Nov 06 '18 at 21:11
  • Both your points make a lot of sense, thankyou. Is there a solution that you would suggest? – Laurie Nov 06 '18 at 21:13
  • one place to start is to change `'w'` to `'a'` which means append, and then move your f.close() and text_file.close() outside of your for loop. – d_kennetz Nov 06 '18 at 21:20

1 Answers1

1

you don't need a loop for this, or to replace & write the file several times. A very efficient approach is:

  • open & read the file
  • use regex replacement function with a lambda, trying to match the words of the text with the dictionary, returning the same word if not found
  • open & write the file (or a new file)

like this:

import re

text = "input.txt"

operators = {'order': '"order"', 'matter':'"matter"'}
with open(text, 'r') as f:
    contents = f.read()

cleaned = re.sub(r"\b(\w+)\b",lambda m : operators.get(m.group(1),m.group(1)),contents)

with open("new_"+text, 'w') as f:
    f.write(cleaned)

This little-known feature is very powerful. It allows to pass a function as a replacement (not a string). This function takes the match as input, and returns the string that must replace the match as output. My function is an anonymous function (lambda):

lambda m : operators.get(m.group(1),m.group(1))

so if the matched word is in the dictionary, it returns & replaces by the value, else it returns the original word.

All that without a loop & O(1) word lookup, so super fast even if you have a lot of items in your dictionary (as opposed to linear nth replace approach, or building list of keywords with "|".join(), which starts to crawl when you have 1000+ items to search/replace)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219