Pandas convert strings to numeric if possible; else keep string values

Question

I have a Pandas Dataframe which has columns which look something like this:

df:

Column0   Column1     Column2
'MSC'       '1'        'R2'
'MIS'       'Tuesday'  '22'
'13'        'Finance'  'Monday'

So overall, in these columns are actual strings but also numeric values (integers) which are in string format.

I found this nice post about the pd.to_numeric and astype() methods, but I can't see if or how I could use them in my case.

Using:

pd.to_numeric(df, errors = 'ignore')

just results in skiping the whole columns. Instead of skipping the whole columns, I only want to skip the strings in those columns which can't be converted, move on to the next entry and try to convert the next string.

So in the end, my dataframe would look like this:

df:

Column0   Column1     Column2
'MSC'       1          'R2'
'MIS'      'Tuesday'    22
 13        'Finance'  'Monday'

Is there maybe an efficient way to loop over these columns and achieve that?

Best regards, Jan

EDIT: Thanks for all your suggestions! Since I am still a python beginner, @coldspeed and @sacul 's answers are easier to understand for me so I will go with one of them!

This just doesn't fit into the pandas model. You're looking for mixed types in a single column... unless you just prefer pandas syntax, you're losing any kind of efficiency gains you might get in data processing — roganjosh, Dec 04 '18 at 23:07
Do you still need to retain the strings that won't convert? If so, would moving them to a new column be acceptable? Mixed data types in a single column doesn't work. — Dan, Dec 04 '18 at 23:08
@roganjosh, although sometimes when you are using pandas to clean up data that you received with mixed types, this is relevant. — Nesha25, Feb 16 '22 at 23:40

score 16 · Accepted Answer · answered Dec 04 '18 at 23:10

100% agree with the comments—mixing dtypes in columns is a terrible idea, performance wise.

For reference, however, I would do this with pd.to_numeric and fillna:

df2 = df.apply(pd.to_numeric, errors='coerce').fillna(df)
print(df2)
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday

Columns are cast to object dtype to prevent coercion. You can see this when you extract the values:

print(df2.values.tolist())
[['MSC', 1.0, 'R2'], ['MIS', 'Tuesday', 22.0], [13.0, 'Finance', 'Monday']]

sacuL · Answer 2 · 2018-12-04T23:20:09.247

I would apply pd.to_numeric with errors='coerce', and update the original dataframe according to the results (see caveats in comments):

# show original string type:
df.loc[0,'Column1']
# '1'

df.update(df.apply(pd.to_numeric, errors='coerce'))

>>> df
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday

# show updated float type:
df.loc[0,'Column1']
# 1.0

Giovanni Bignardi · Answer 3 · 2018-12-05T10:01:41.590

3

Or you could simply use the isnumeric() method of str. I like it because the syntax is clear, although according to coldspeed's comment, this can become very slow on large df.

df = df.applymap(lambda x: int(x) if x.isnumeric() else x)

Example:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([['a','b','c'],['1','1a','c']],columns=['Col1','Col2','Col3'])

In [3]: df
Out[3]:
  Col1 Col2 Col3
0    a    b    c
1    1   1a    c

In [4]: df.Col1.map(lambda x: int(x) if x.isnumeric() else x)
Out[4]:
0    a
1    1
Name: Col1, dtype: object

edited Dec 05 '18 at 10:01

answered Dec 04 '18 at 23:21

Giovanni Bignardi

76
5

1

Please add a disclaimer that this can become very slow for large DataFrames. – cs95 Dec 04 '18 at 23:53

score 3 · Answer 4 · edited Dec 05 '18 at 04:13

Using to_numeric + ignore

df=df.applymap(lambda x : pd.to_numeric(x,errors='ignore'))
df
  Column0  Column1 Column2
0     MSC        1      R2
1     MIS  Tuesday      22
2      13  Finance  Monday
df.applymap(type)
                 Column0                Column1                Column2
0          <class 'str'>  <class 'numpy.int64'>          <class 'str'>
1          <class 'str'>          <class 'str'>  <class 'numpy.int64'>
2  <class 'numpy.int64'>          <class 'str'>          <class 'str'>

Pandas convert strings to numeric if possible; else keep string values

4 Answers4

Linked

Related