2

I have a pandas Data Frame from a Excel File as Input in my program.

I would like to replace some non ASCII characters in the pandas Data Frame.

import pandas as pd
XList=['Meßi','Ürik']
YList=['01.01.1970','01.01.1990']

df = pd.DataFrame({'X':XList,
                   'Y':YList})

      X           Y
0  Meßi  01.01.1970 
1  Ürik  01.01.1990

I would like to create some replace rules: eg. ß->ss and Ü->UE

and get this:

       X           Y
0  Messi  01.01.1970 
1  UErik  01.01.1990

Note: Im using Python 2.7

UPDATE:

Solved using the answer below and setting up by Eclipse following:

1°: Changing Text file encoding in Eclipe to UTF-8.

How to: How to use Special Chars in Java/Eclipse

2°: Adding to the first line command

# -*- coding: UTF-8 -*- 

http://www.vogella.com/tutorials/Python/article.html

Community
  • 1
  • 1
Hangon
  • 2,449
  • 7
  • 23
  • 31

1 Answers1

1

One way would be to create a dict and iterate over the k,v and use replace:

In [42]:

repl_dict = {'ß':'ss', 'Ü':'UE'}
for k,v in repl_dict.items():
    df.loc[df.X.str.contains(k), 'X'] = df.X.str.replace(pat=k,repl=v)
df

Out[42]:
       X           Y
0  Messi  01.01.1970
1  UErik  01.01.1990

EDIT

for editors that don't allow unicode encoding in the python script you can use the unicode values for the transliteration:

In [72]:

repl_dict = {'\u00DF':'ss', '\u00DC':'UE'}
for k,v in repl_dict.items():
    df.loc[df.X.str.contains(k), 'X'] = df.X.str.replace(pat=k,repl=v)
df

Out[72]:
       X           Y
0  Messi  01.01.1970
1  UErik  01.01.1990
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • I cant type non ASCII char in Eclipse in my code. Do you know how could I rewrite this? repl_dict = {'ß':'ss', 'Ü':'UE'} – Hangon Dec 18 '14 at 11:28
  • I suggest you use a different editor, I'm using IPython but you should be able to save unicode characters in your python script. Otherwise lookup how to enable unicode encoding in eclipse – EdChum Dec 18 '14 at 11:30
  • Thank you! Problem solved with Eclipse. I updated it. – Hangon Dec 18 '14 at 13:15