2

I'm having quite a lot of trouble understanding the .replace from pandas for special characters.

I have a dataframe that I need to change some text for greek letters. I have done it before, on the same code, and it worked perfectly, but for some reason I could not figure out the second time it dit not work.

import pandas as pd

df = pd.DataFrame({'a' = [Aa_alpha_bb, Cc_beta_dd, Ee_gamma_ff]})

#then I did:
df['a'].replace({'_alpha_':'α', '_beta_':'β', '_gamma_':'γ'}, regex = True, inplace = True)

But I get the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)

I have also tried using df['a'].astype(str), but to no avail

I have no experience using special characters and encoding in python. I'm also new using python 2.7, because the project I'm working on now requires this specific version. Can someone help me?

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
Karol Duarte
  • 123
  • 1
  • 1
  • 8
  • 1
    Try making the first line `# -*- coding: UTF-8 -*-` – MDR Aug 09 '21 at 20:16
  • I can't reproduce this - what kind of environment are you in? If *nix, what is the value of the environment variables 'LC_ALL', 'LC_CTYPE', 'LANG', 'LANGUAGE'? Or are you on WIndows? – snakecharmerb Aug 10 '21 at 04:35
  • @MDR If the issue was a missing encoding cookie there would be a `SyntaxError`, not a `UnicodeDecodeError`. – snakecharmerb Aug 10 '21 at 04:37

1 Answers1

1

I am pretty sure this has to do with your file not being utf-8 encoded. See other stackoverflow question: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

Throwing in a first line being equal to:

# coding: utf-8
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

should do the trick. In python 3 this is by default set to utf8

Anton van der Wel
  • 451
  • 1
  • 6
  • 20