i am trying to get rid off strings like \xa0 \xc2 etc.. I know that is an encoding problem but how i'll do this ? Non of utf-8 , "ISO-8859-1" encoding option worked for me..
train = pd.read_csv('./data/train.csv',index_col = False,low_memory = False,encoding='utf-8')
test = pd.read_csv('./data/test.csv',index_col = False,low_memory = False,encoding="ISO-8859-1")
This is the output after using
train = pd.DataFrame(data = train)
print(train)
Insult Date Comment
1 0 20120528192215Z "i really don't understand your point.\xa0 It ...
2 0 NaN "A\\xc2\\xa0majority of Canadians can and has ...
3 0 NaN "listen if you dont wanna get married to a man...
4 0 20120619094753Z "C\xe1c b\u1ea1n xu\u1ed1ng \u0111\u01b0\u1edd...