5

I have a pd DataFrame with integers displayed as strings:

frame = pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'), index=['1', '2', '3', '4'])
frame = frame.apply(lambda x: x.astype(str))

This gives me a dataframe:

     A      B      C
1 -0.890  0.162  0.477
2 -1.403  0.160 -0.570
3 -1.062 -0.577 -0.370
4  1.142  0.072 -1.732

If I type frame.type() I will get objects. Now I want to convert columns ['B':'C'] to numbers.

Imagine that I have dozens of columns and therefore I would like to slice them. So what I do is:

frame.loc[:,'B':'C'] = frame.loc[:,'B':'C'].apply(lambda x: pd.to_numeric(x, errors='coerce')

If I just wanted to alter column, say, B, I would type:

frame['B'] = frame['B'].apply(lambda x: pd.to_numeric(x, errors='coerce')

and that would convert B into into float64 BUT if I use it with .loc then nothing happens after I call DataFrame.info()!

Can someone help me? OF course I can just type all columns but I would like to get a more practical approach

jpp
  • 159,742
  • 34
  • 281
  • 339
Max Grinkov
  • 53
  • 1
  • 1
  • 5
  • Related: [pandas: to_numeric for multiple columns](https://stackoverflow.com/questions/36814100/pandas-to-numeric-for-multiple-columns) – jpp Mar 28 '18 at 16:42

2 Answers2

9

You can pass kwargs to apply

In Line with assign

frame.assign(**frame.loc[:, 'B':'C'].apply(pd.to_numeric, errors='coerce'))

                 A         B         C
1   -1.50629471392 -0.578600  1.651437
2   -2.42667924339 -0.428913  1.265936
3  -0.866740402265 -0.678886 -0.094709
4    1.49138962612 -0.638902 -0.443982

In Place with update

frame.update(frame.loc[:, 'B':'C'].apply(pd.to_numeric, errors='coerce'))
frame

                 A         B         C
1   -1.50629471392 -0.578600  1.651437
2   -2.42667924339 -0.428913  1.265936
3  -0.866740402265 -0.678886 -0.094709
4    1.49138962612 -0.638902 -0.443982
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Dear piRsquared, thanks a lot! The following code worked for me: frame = frame.assign(**frame.loc[:, 'B':'C'].apply(pd.to_numeric, errors='coerce')) – Max Grinkov Mar 28 '18 at 17:09
  • second method with update does not work for me. Possibly because of [this issue](https://github.com/pandas-dev/pandas/issues/4094) about df.update not conserving dtypes – M. Schlenker Feb 11 '22 at 14:33
5

you can generate a list of columns as follows:

In [96]: cols = frame.columns.to_series().loc['B':'C'].tolist()

and use this variable for selecting "columns of interest":

In [97]: frame[cols] = frame[cols].apply(lambda x: pd.to_numeric(x, errors='coerce'))

In [98]: frame.dtypes
Out[98]:
A     object
B    float64
C    float64
dtype: object
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419