I'm using Python2 and I try to put many words of a french dictionary in a set object, but I always have an encoding problem with the words that have accent.
This is my main code (this part reads a text file):
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
from sets import Set
with open('.../test_unicode.txt', 'r') as word:
lines = word.readlines()
print(lines)
And this is the result of my print:
['\xc3\xa9l\xc3\xa9phants\n', 'bonjour\n', '\xc3\xa9l\xc3\xa8ves\n']
This is my text file for this example:
éléphants
bonjour
élèves
After, this is the continuity of my main code that put the words in a python set:
dict_word = Set()
for line in lines:
print(line)
dict_word.add(line[:-1].upper()) #Get rid of the '\n'
print(dict_word)
This is the result of my print:
Set(['\xc3\xa9L\xc3\xa8VES', 'BONJOUR', '\xc3\xa9L\xc3\xa9PHANTS'])
What I want is this output:
Set(['ÉLÈVES', 'BONJOUR', 'ÉLÉPHANTS'])
But I can't figure out a way to have this result I tried many ways including putting this line '# -- encoding: utf-8 --' at the top of my file. I also tried 'with codecs.open()' but it didn't work either.
Thanks!