13

A common operation that I need to do with pandas is to read the table from an Excel file and then remove semicolons from all the fields. The columns are often in mixed data types and I run into AtributeError when trying to do something like this:

for col in cols_to_check:
    df[col] = df[col].map(lambda x: x.replace(';',''))

AttributeError: 'float' object has no attribute 'replace'

when I wrap it in str() before replacing I have problems with Unicode characters, e.g.

for col in cols_to_check:
    df[col] = df[col].map(lambda x: str(x).replace(';',''))

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

In excel this is a very simple operation, all it takes is to replace ; with an empty string. How can I do it similarly in pandas for entire dataframe, disregard of data types? Or am I missing something?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
MJB
  • 793
  • 1
  • 11
  • 25

1 Answers1

25

You can use DataFrame.replace and for select use subset:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':['f;','d:','sda;sd'],
                   'D':['s','d;','d;p'],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   A  B       C    D  E  F
0  1  4      f;    s  5  7
1  2  5      d:   d;  3  4
2  3  6  sda;sd  d;p  6  3

cols_to_check = ['C','D', 'E']

print (df[cols_to_check])
        C    D  E
0      f;    s  5
1      d:   d;  3
2  sda;sd  d;p  6

df[cols_to_check] = df[cols_to_check].replace({';':''}, regex=True)
print (df)
   A  B      C   D  E  F
0  1  4      f   s  5  7
1  2  5     d:   d  3  4
2  3  6  sdasd  dp  6  3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    thanks, this worked! What I was missing was the `regex=True`. Without it `replace` searches for the entire string rather then partial (same as excel's match entire cell content option) – MJB Feb 09 '17 at 12:12
  • 2
    I would suggest one little update to your answer. In my question I mention that I have mixed data types in columns (not only between columns). But in your example you use consistent data types. Also you only check the columns with strings, but `df=df.replace({';':''}, regex=True)` works just as well and even if you change one of the columns to something like `['a;',1.02,'b']` – MJB Feb 09 '17 at 12:18
  • Yes, you are right. I test it, but I dont checnge it. – jezrael Feb 09 '17 at 12:20
  • I add column `E` what is int column. Thank you for suggestion. – jezrael Feb 09 '17 at 12:20
  • 1
    @jezrael Just noticed that this code does not work with `python-3.6` but does work with `python-2.7`. Is this a known issue? – Achintha Ihalage Aug 27 '19 at 16:14
  • @AchinthaIhalage - Not sure about it. But last version of pandas support only [`python 3`](https://pandas.pydata.org/pandas-docs/stable/install.html#python-version-support) – jezrael Aug 28 '19 at 04:52