Converting non numeric columns to numeric columns

Question

My imports are:

import pandas as pd
import numpy as np
from pandas.api.types import is_numeric_dtype

I created a pandas dataframe (named df) that looks like this:

   state  initial_temp     final_temp
0   Cold    48.0              88.1
1   hot     80.7              30.0
2   hot    140.2              25.0
3   hot     59.8              25.0
4   hot     80.0              25.0

All the columns have dtypes object, however, the only column that should have that dtype is the state column. I am trying to convert all the actual numeric columns (initial and final temp) to numerical dtypes and ignore/leave out the state column. This is mainly for pedagogical purposes.

My current attempt at this is:

def datatype_converter(df):
    col_list = []
    for column in df.columns:
        col_list.append(column)
        for i in range(len(col_list)):
            if is_numeric_dtype(df[col_list.pop()]):
                df.apply(pd.to_numeric, errors = 'coerce') # coerce invalid values to nan. 
            else:
                pass
    return df

Onyambu · Accepted Answer · 2021-05-25T01:26:55.847

2

You could do

df.transform(pd.to_numeric, errors = 'ignore')

edited May 25 '21 at 01:26

answered May 25 '21 at 01:02

Onyambu

67,392
3
24
53

Thanks a lot! This helps with improving my programming skills. However, it's not really what I'm looking for. I'm trying to solve this with the assumption that there might not be a consensus name for the state column, just that its contents are either 'Hot' or 'Cold'. Hence, I'm trying to 'automate' the process. – Caesar May 25 '21 at 01:23
@Caesar check the edit – Onyambu May 25 '21 at 01:26

score 0 · Answer 2 · answered May 25 '21 at 00:33

0

You use astype on the appropriate columns like this:

df[["initial_temp", "final_temp"]] = df[["initial_temp", "final_temp"]].astype(float)

Where df is your DataFrame. Of course, this makes the assumption that all the values in those columns are numeric.

answered May 25 '21 at 00:33

Chicodelarose

827
7
22

1

This works, however, it's not really what I'm looking for. I'm trying to do it with the assumption that I don't know how many columns there are. Hence, I'm trying to 'automate the process. – Caesar May 25 '21 at 00:39

score 0 · Answer 3 · answered May 25 '21 at 00:41

0

This seems awfully complicated. How about

target_cols = [col for col in df.columns if 
    is_numeric_dtype(df[col])
]
for col in target_cols:
    df.loc[:, col] = pd.to_numeric(df[col])

I'm sure there's a savvier way to get this down to one statement and avoid iterating over columns (applying a boolean mask instead), but this is readable and concise.

answered May 25 '21 at 00:41

William Bradley

355
4
10

Thanks a lot! This helps with improving my programming skills. However, it doesn't seem to solve the problem. All the columns are still objects. Is there something I'm missing/not understanding? – Caesar May 25 '21 at 01:20
It works fine for me, converting (in your example) initial_temp and final_temp to float64 dtype. – William Bradley May 25 '21 at 13:33

Converting non numeric columns to numeric columns

3 Answers3