I'm using pandas to load in a csv file containing twitter messages
corpus = pd.read_csv(data_path, encoding='utf-8')
Here is an example of the data
label,date,comment
0,20120528192215Z,"""i really don't understand your point.\xa0 It seems that you are mixing apples and oranges."""
When I try to print the comment I get:
print(corpus.iloc[1]['comment'])
>> "i really don't understand your point.\xa0 It seems that you are mixing apples and oranges."
The \xa0 is still in the output. But if I paste the string from the file and print it, I get the correct output
print("""i really don't understand your point.\xa0 It seems that you are mixing apples and oranges.""")
>> i really don't understand your point. It seems that you are mixing apples and oranges.
I would like to know why the two outputs are different and if there is a way to get the string in pandas to be printed correctly? I would like if there is a better solution then just replace since the data contains many other Unicode representations such as \xe1, \u0111, \u01b0, \u1edd etc.