pandas: str.replace fails on Spanish characters

Question

I have a simple script that converts values from one type to another using str.replace. It works fine for values using standard English characters but it seems confused by Spanish characters like ñ.

Example:

df['Best time to call?'] = df['Best time to call?']\
    .str.replace('Afternoons / La tarde','afternoon')\
    .str.replace('Evenings / La noche','evening')\
    .str.replace('Do Not Call / No llamar','')\
    .str.replace('Morning / La mañana','morning')

The converted values are output to a CSV through pandas. When I open the CSV in Excel, 'Morning / La mañana' has been converted to 'Morning / La maï¿½ana' even though all the other ones worked.

EDIT

The older question you all have suggested was specific to python 2 and I am using python 3.

I have found that preemptively replacing the mojibake solves the issue.

df['Best time to call?'] = df['Best time to call?'].str.replace('�', 'n')

followed by:

df['Best time to call?'] = df['Best time to call?'.str.replace('Morning / La manana','morning')

Not the most elegant solution, but it will do for now.

Have you checked out the response here? I think it should help https://stackoverflow.com/questions/23839465/python-pandas-replace-special-character — Scinana, May 27 '21 at 14:41
Does this answer your question? [Python Pandas Replace Special Character](https://stackoverflow.com/questions/23839465/python-pandas-replace-special-character) — JMA, May 27 '21 at 14:43
This is not a failure. You have written the file out as UTF-8 but Excel thinks the encoding is Windows-1252. This has a name: *mojibake*, a Japanese name for a problem that first appeared with East Asian scripts but has now become a worldwide plague. What to do depends on what you want. If you want it to look right in Excel, open the file in your program with `encoding="cp1252"`. If you are just using Excel to inspect the file, use something else, such as Notepad++ (free), to view it. — BoarGules, May 27 '21 at 14:48
I always make a point to check for older answers before posting a question but somehow missed this one. Thanks. I'll check it out. — NWWPA, May 27 '21 at 14:48

score 1 · Answer 1 · answered May 27 '21 at 14:43

1

This is an Excel problem, not a Pandas problem.

Excel is interpreting your UTF-8 encoded CSV file as ISO-8859-1, leading to the mojibake you're seeing.

Either

use Excel's Open dialog -> Open Text File -> set the encoding to UTF-8
save your file as e.g. XLS, which is always opened with a correct encoding by Excel
save the CSV file as ISO-8859-1 instead of UTF-8 (not recommended; you can't represent all characters in ISO-8859-1, and many other software expects UTF-8)

answered May 27 '21 at 14:43

AKX

152,115
15
115
172

Thanks for you quick response, but I don't it's Excel. The code is supposed to convert 'Morning / La mañana' to 'morning,' so the character shouldn't even be there for Excel to misinterpret. – NWWPA May 27 '21 at 14:46

pandas: str.replace fails on Spanish characters

1 Answers1