pandas dataframe and u'\u2019'

Question

I have a pandas dataframe (python 2.7) containing a u'\u2019' that does not let me extract as csv my result.

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 180: ordinal not in range(128)

Is there a way to query the dataframe and substitude these character with another one?

See http://stackoverflow.com/questions/3224268/python-unicode-encode-error and map the proper column with such encoding function — jarandaf, Jul 30 '15 at 16:32
AttributeError: 'DataFrame' object has no attribute 'encode' — Blue Moon, Jul 30 '15 at 16:37

score 1 · Answer 1 · answered Jul 30 '15 at 20:35

1

Try using a different encoding when saving to file (the default in pandas for Python 2.x is ascii, that's why you get the error since it can't handle unicode characters):

df.to_csv(path, encoding='utf-8')

answered Jul 30 '15 at 20:35

jarandaf

4,297
6
38
67

I tried but it does not work. I get: UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 131152: invalid start byte – Blue Moon Jul 31 '15 at 07:49
Could you somehow provide us of the dataframe data? it's clear there are strange characters in your dataset which makes writing to file fail. – jarandaf Jul 31 '15 at 08:15
It is confidential data that I cannot share. – Blue Moon Jul 31 '15 at 08:31

score 0 · Answer 2 · answered Jul 31 '15 at 08:33

I did not manage to export the whole file. However, I managed to identity the row with the character causing problems and eliminate it

faulty_rows = []
for i in range(len(outcome)):
    try:
        test = outcome.iloc[i]
        test.to_csv("/Users/john/test/test.csv")
    except:
        pass
        faulty_rows.append(i)
        print i


tocsv = tocsv.drop(outcome.index[[indexes]])    

tocsv.to_csv("/Users/john/test/test.csv")

pandas dataframe and u'\u2019'

2 Answers2