0

I'm trying to convert a csv file into a dictionary and replace the keys with the values in a text. I tried several things and they didn't work.

My csv file (en.csv):

chat,chats
noir,noirs
km,k
table,tables
être,sont
être,est
cinema,cinemas
fermer,fermés
casser,cassées

My script:

import csv
import re

with open('fr.csv', 'r') as lemmas:
    lematizer = csv.reader(lemmas, delimiter = ',')

    text = "chats sont noirs tables sont cassées ils sont tristes la vie est belle cinemas sont fermés"

    tuples = []
    for tokens in lematizer:
        tuples.append(tuple([tokens[1], tokens[0]]))
    result = dict(tuples)

    lemmatized_text = re.sub(r'\b(%s)\b' % '|'.join(result.keys()), lambda m:result.get(m.group(1), m.group(1)), text)

    print(lemmatized_text)

Output is good!

chat être noir table être casser il etre triste le vie etre beau cinema être fermer

But when I try to do the same in a large text with many lines, it doesn't work: I only have text printed unmodified. And even worse: I also have this error (everything is in utf-8)

colis livr� �

Encodings:

fr.csv: UTF-8 Unicode text
text.csv: UTF-8 Unicode text, with very long lines
.py: Python script, UTF-8 Unicode text executable 
marin
  • 923
  • 2
  • 18
  • 26
  • 2
    You need to tell csv reader that it is utf8 https://stackoverflow.com/questions/904041/reading-a-utf8-csv-file-with-python#904085 – Martin Beckett Jul 23 '18 at 15:24
  • 1
    might want to decode using `latin-1` – rafaelc Jul 23 '18 at 15:30
  • 1
    you can create your dict directly with `dictionary = {}` \n `for row in lematizer:` \n `k,v = row`\n `dictionary[v] = k` after you create the variable `lematizer`. – Jones1220 Jul 23 '18 at 15:30
  • 1
    My first guess is that your console is not using UTF-8, so, while you actually are decoding things properly, you can’t display those characters on the console when you `print` them. But, rather than guess: (1) What Python version? (2) What platform are you on? (3) If Linux or *BSD, what terminal are you using, and what are your `LOCALE`/`LC_*` variables? – abarnert Jul 23 '18 at 15:42
  • It's works!! Thanks a lot to everyone!, How can I close this question? For me everything is clear and everything works. – marin Jul 23 '18 at 15:53

0 Answers0