Pandas DataFrame's accented characters appearing garbled in Excel

Question

With:

# -*- coding: utf-8 -*-

at the top of my .ipynb, Jupyter is now displaying accented characters correctly.

When I export to csv (with .to_csv()) a pandas data frame containing accented characters:

... the characters do not render properly when the csv is opened in Excel.

This is the case whether I set the encoding='utf-8' or not. Is pandas/python doing all that it can here, and this is an Excel issue? Or can something be done before the export to csv?

Python: 2.7.10
Pandas: 0.17.1
Excel: Excel for Mac 2011

Most likely it's an Excel issue. Check the csv import options; one of them should be character encoding. — awm, Mar 30 '16 at 02:31
It seems that `to_excel()` does not exhibit the same problem, though I'm forced to export as a .xls, which is not ideal. — Pyderman, Mar 30 '16 at 02:35
I think you'll find the answer here (use the Import Text wizard): http://stackoverflow.com/questions/155097/microsoft-excel-mangles-diacritics-in-csv-files — awm, Mar 30 '16 at 02:37

score 9 · Answer 1 · answered Jan 11 '17 at 10:35

9

If you want to keep accents, try with encoding='iso-8859-1'

df.to_csv(path,encoding='iso-8859-1',sep=';')

answered Jan 11 '17 at 10:35

Juliana Rivera

1,013
2
9
15

1

this may work, but it feels like a hack since utf-8 is a better universal standard – Selah Feb 27 '19 at 21:55

score 3 · Answer 2 · answered Feb 27 '19 at 22:01

I had similar problem, also on a Mac. I noticed that the unicode string showed up fine when I opened the csv in TextEdit, but showed up garbled when I opened in Excel.

Thus, I don't think there is any way successfully export unicode to Excel with to_csv, but I'd expect the default to_excel writer to suffice.

df.to_excel('file.xlsx', encoding='utf-8')

score 3 · Answer 3 · answered Aug 04 '19 at 20:16

I also had the same inconvenience. When I checked the Dataframe in the Jupyter notebook I saw that everything was in order.

The problem happens when I try to open the file directly (as it has a .csv extension Excel can open it directly).

The solution for me was to open a new blank excel workbook, and import the file from the "Data" tab, like this:

Import External Data
Import Data from text
I choose the file
In the import wizard window, where it says "File origin" in the drop-down list, I chose the "65001 : Unicode (utf-8)"

Then i just choose the right delimiter, and that was it for me.

score 2 · Answer 4 · answered Mar 27 '18 at 06:06

2

I think using a different excel writer helps, recommending xlsxwriter

import pandas as pd
df = ...
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
df.to_excel(writer)
writer.save()

answered Mar 27 '18 at 06:06

Deo Leung

848
9
9

score 1 · Answer 5 · answered Mar 30 '16 at 02:34

1

Maybe try this function for your columns if you can't get Excel to cooperate. It will remove the accents using the unicodedata library:

import unicodedata

def remove_accents(input_str):

    if type(input_str) == unicode:
        nfkd_form = unicodedata.normalize('NFKD', input_str)
        return u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
    else:
        return input_str

answered Mar 30 '16 at 02:34

Greg Friedman

341
4
11

I actually need to maintain the accents, though your function may come in handy in the future, thanks. Not sure if it's doing what is intended though. Tried a quick test: http://pastebin.com/gA8FuGDq – Pyderman Mar 30 '16 at 02:40

score 1 · Answer 6 · answered Feb 06 '20 at 01:23

I had the same problem, and writing to .xlsx and renaming to .csv didn't solve the problem (for application-specific reasons I won't go into here), nor was I able to successfully use an alternate encoding as Juliana Rivera recommended. 'Manually' writing the data as text worked for me.

with open(RESULT_FP + '.csv', 'w+') as rf:
    for row in output:
        row = ','.join(list(map(str, row))) + '\n'
        rf.write(row)

Sometimes I guess you just have to go back to basics.

score 1 · Answer 7 · answered Dec 14 '20 at 22:07

I encountered a similar issue when attempting to read_json followed by a to_excel:

df = pandas.read_json(myfilepath)
# causes garbled characters
df.to_excel(sheetpath, encoding='utf8') 
# also causes garbled characters
df.to_excel(sheetpath, encoding='latin1')

Turns out, if I load the json manually with the json module first, and then export with to_excel, the issue doesn't occur:

with open(myfilepath, encoding='utf8') as f:
    j = json.load(f)

df = pandas.DataFrame(j)
df.to_excel(sheetpath, encoding='utf8')

Pandas DataFrame's accented characters appearing garbled in Excel

7 Answers7