how to save string in 'utf-8' format in a .xlsx using pandas to_exel (to_csv is able to save it as .csv)

Question

since I am able to save string using to_csv using 'utf-8' encoding, I am expecting to do be able to do the same using to_excel. This is not an issue on my side with encoding. None of the previous thread I saw, discuss this issue.

I am using python 2.7.12 on Windows 7(Anaconda) and pandas 0.18.1

I have 2 questions related to saving a panda dataframe containing special character (encoding as 'utf-8') as a .csv or .xlsx file.

For example:

import pandas as pd

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': ['1', 'as', '?%','ä']})

I can save the dataframe as a .cvs file without any issue:

df.to_csv('test_csv.csv',sep=',', encoding='utf-8')

and it works. When importng the data I need to choose 'utf-8' in Excel and everything is fine.

Now if I try to save the same dataframe as an .xlsx then it doesn't work.

I have the following code:

 # Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter',   options={'encoding':'utf-8'})

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1',encoding='utf-8')
writer.save()

and I got the following error message:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

I am not 100% sure to use the correct option to set the encoding:

options={'encoding':'utf-8'}

and

encoding='utf-8'

since it is not clear to me how to proceed from the documentation.

Any idea how to have this working ?

Bonus question related to df.to_csv. Is there a way to use some special character as separator ? Some reason, the code I am migrated from R to python is using sep='¤'. tried to encode this special character in all possible way but is always failed. Is it possible to do that ?

Thanks a lot

Cheers

Fabien

Possible duplicate of [Pandas: save to excel encoding issue](http://stackoverflow.com/questions/34485982/pandas-save-to-excel-encoding-issue) — Fabio Lamanna, Oct 19 '16 at 08:05
favorite This question may already have an answer here: Pandas: save to excel encoding issue 2 answers since I am able to save the string using to_csv using 'utf-8' encoding, I am expecting to do be able to do the same using to_excel. This is not an issue on my side with encoding. None of the previous thread I saw, discuss this issue. Please correct me if I am wrong. — Dr. Fabien Tarrade, Oct 19 '16 at 08:25

score 1 · Accepted Answer · answered Oct 19 '16 at 09:34

1

If you are using xlsxwriter as the Excel writing engine then the encoding='utf-8' is ignored because the XlsxWriter module doesn't use it.

XlsxWriter requires that the string data is encoded as utf8. After that it handles the strings automatically.

So you will need to ensure that the string data you are writing is encoded as utf8 via Pandas: either when you read it or after the data is in the data frame.

answered Oct 19 '16 at 09:34

jmcnamara

38,196
6
90
108

Thanks @jmcnamara, this explain my issue. By fixing the encoding in the dataframe fix my issue. – Dr. Fabien Tarrade Oct 20 '16 at 12:43

how to save string in 'utf-8' format in a .xlsx using pandas to_exel (to_csv is able to save it as .csv)

1 Answers1