Hello StackOverflow community.
I am a fairly new user of Python, so sorry in advance for the sillyness of this question ! But I have tried to fix it out for hours but still not having figured it out.
I am trying to import a large dataset of text to manipulate it in Python.
This data set is in .csv and I've had problems reading it because of encoding problems.
I have tried to encode it in UTF-8 text with notepad++ I have tried the csv.reader module in Python
Here is an example of my code :
import csv
with open('twitter_test_python.csv') as csvfile:
#for file5 in csvfile:
# file5.readline()
#csvfile = csvfile.encode('utf-8')
spamreader = csv.reader(csvfile, delimiter=str(','), quotechar=str('|')
for row in spamreader:
row = " ".join(row)
row2= str.split(row)
listsw = []
for mots in row2:
if mots not in sw:
del mots
print row2
But when I import my data in Python I still have encoding problems (accents, etc) whether method I use.
How can I encode my data so that it is readable properly with Python ?
Thanks !