I'm trying to convert a csv file into a dictionary and replace the keys with the values in a text. I tried several things and they didn't work.
My csv file (en.csv):
chat,chats
noir,noirs
km,k
table,tables
être,sont
être,est
cinema,cinemas
fermer,fermés
casser,cassées
My script:
import csv
import re
with open('fr.csv', 'r') as lemmas:
lematizer = csv.reader(lemmas, delimiter = ',')
text = "chats sont noirs tables sont cassées ils sont tristes la vie est belle cinemas sont fermés"
tuples = []
for tokens in lematizer:
tuples.append(tuple([tokens[1], tokens[0]]))
result = dict(tuples)
lemmatized_text = re.sub(r'\b(%s)\b' % '|'.join(result.keys()), lambda m:result.get(m.group(1), m.group(1)), text)
print(lemmatized_text)
Output is good!
chat être noir table être casser il etre triste le vie etre beau cinema être fermer
But when I try to do the same in a large text with many lines, it doesn't work: I only have text printed unmodified. And even worse: I also have this error (everything is in utf-8)
colis livr� �
Encodings:
fr.csv: UTF-8 Unicode text
text.csv: UTF-8 Unicode text, with very long lines
.py: Python script, UTF-8 Unicode text executable