0

I am starting to play with pandas.

I downloaded a google sheet.

When reading some data from excel in win7:

xls = pd.ExcelFile('C:/Users/file.xlsx')
data = xls.parse('Sheet 1', index_col=None, na_values=['NA'])
print "Data", data 

I am a getting:

Decode error - output not utf-8

The original excel file has text and numbers.

What is wrong?

Thanks,

Diego
  • 637
  • 3
  • 10
  • 24

2 Answers2

0

Try adding a different encoding argument such as iso-8859-1. Here is an exhaustive list from the Internet Assigned Numbers Authority (IANA). Though data may look like legitimate Latin numbers and text, one character could require a different character set, depending on origination.

Also you can either use the two step process, ExcelFile or one-step process, read_excel:

ExcelFile

xls = pd.ExcelFile('C:/Users/file.xlsx')
data = xls.parse('Sheet 1', index_col=None, na_values=['NA'], encoding='iso-8859-1')
print data.head()

read_excel

data = pd.read_excel('C:/Users/file.xlsx', 'Sheet 1', encoding='iso-8859-1')
print data.head()
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • thank you for the answer. Unfortunately so far none worked. I will keep on trying. – Diego Sep 30 '15 at 10:35
  • Try this popular encoding list [here](http://stackoverflow.com/questions/8509339/what-is-the-most-common-encoding-of-each-language). Usually character sets depend on language of file's origin. – Parfait Sep 30 '15 at 13:52
  • Thank you for the list. The google sheet I am importing is mine. I think there might be a format issue when I download it to an excel file on my pc. What do you think? – Diego Oct 01 '15 at 15:34
0

This is because, the encoding of your data changes from ASCII to latin1. try this encoding cp1252

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111