0

I have a pandas dataframe (python 2.7) containing a u'\u2019' that does not let me extract as csv my result.

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 180: ordinal not in range(128)

Is there a way to query the dataframe and substitude these character with another one?

jarandaf
  • 4,297
  • 6
  • 38
  • 67
Blue Moon
  • 4,421
  • 20
  • 52
  • 91

2 Answers2

1

Try using a different encoding when saving to file (the default in pandas for Python 2.x is ascii, that's why you get the error since it can't handle unicode characters):

df.to_csv(path, encoding='utf-8')
jarandaf
  • 4,297
  • 6
  • 38
  • 67
  • I tried but it does not work. I get: UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 131152: invalid start byte – Blue Moon Jul 31 '15 at 07:49
  • Could you somehow provide us of the dataframe data? it's clear there are strange characters in your dataset which makes writing to file fail. – jarandaf Jul 31 '15 at 08:15
  • It is confidential data that I cannot share. – Blue Moon Jul 31 '15 at 08:31
0

I did not manage to export the whole file. However, I managed to identity the row with the character causing problems and eliminate it

faulty_rows = []
for i in range(len(outcome)):
    try:
        test = outcome.iloc[i]
        test.to_csv("/Users/john/test/test.csv")
    except:
        pass
        faulty_rows.append(i)
        print i


tocsv = tocsv.drop(outcome.index[[indexes]])    

tocsv.to_csv("/Users/john/test/test.csv")
Blue Moon
  • 4,421
  • 20
  • 52
  • 91