I have a dictionary dict
with some words (2000) and I have a huge text, like Wikipedia corpus, in text format. For each word
that is both in the dictionary and in the text file, I would like to replace it with word_1
.
with open("wiki.txt",'r') as original, open("new.txt",'w') as mod:
for line in original:
new_line = line
for word in line.split():
if (dict.get(word.lower()) is not None):
new_line = new_line.replace(word,word+"_1")
mod.write(new_line)
This code creates a new file called new.txt
with the words that appear in the dictionary replaced as I want.
This works for short files, but for the longer that I am using as input, it "freezes" my computer.
Is there a more efficient way to do that?
Edit for Adi219:
Your code seems working, but there is a problem:
if a line is like that: Albert is a friend of Albert
and in my dictionary I have Albert
, after the for cycle, the line will be like this:Albert_1_1 is a friend of Albert_1
. How can I replace only the exact word that I want, to avoid repetitions like _1_1_1_1
?
Edit2: To solve the previous problem, I changed your code:
with open("wiki.txt", "r") as original, open("new.txt", "w") as mod:
for line in original:
words = line.split()
for word in words:
if dict.get(word.lower()) is not None:
mod.write(word+"_1 ")
else:
mod.write(word+" ")
mod.write("\n")
Now everything should work