1

My imports are:

import pandas as pd
import numpy as np
from pandas.api.types import is_numeric_dtype

I created a pandas dataframe (named df) that looks like this:

   state  initial_temp     final_temp
0   Cold    48.0              88.1
1   hot     80.7              30.0
2   hot    140.2              25.0
3   hot     59.8              25.0
4   hot     80.0              25.0

All the columns have dtypes object, however, the only column that should have that dtype is the state column. I am trying to convert all the actual numeric columns (initial and final temp) to numerical dtypes and ignore/leave out the state column. This is mainly for pedagogical purposes.

My current attempt at this is:

def datatype_converter(df):
    col_list = []
    for column in df.columns:
        col_list.append(column)
        for i in range(len(col_list)):
            if is_numeric_dtype(df[col_list.pop()]):
                df.apply(pd.to_numeric, errors = 'coerce') # coerce invalid values to nan. 
            else:
                pass
    return df
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Caesar
  • 117
  • 10

3 Answers3

2

You could do

df.transform(pd.to_numeric, errors = 'ignore')
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Thanks a lot! This helps with improving my programming skills. However, it's not really what I'm looking for. I'm trying to solve this with the assumption that there might not be a consensus name for the state column, just that its contents are either 'Hot' or 'Cold'. Hence, I'm trying to 'automate' the process. – Caesar May 25 '21 at 01:23
  • @Caesar check the edit – Onyambu May 25 '21 at 01:26
0

You use astype on the appropriate columns like this:

df[["initial_temp", "final_temp"]] = df[["initial_temp", "final_temp"]].astype(float)

Where df is your DataFrame. Of course, this makes the assumption that all the values in those columns are numeric.

Chicodelarose
  • 827
  • 7
  • 22
  • 1
    This works, however, it's not really what I'm looking for. I'm trying to do it with the assumption that I don't know how many columns there are. Hence, I'm trying to 'automate the process. – Caesar May 25 '21 at 00:39
0

This seems awfully complicated. How about

target_cols = [col for col in df.columns if 
    is_numeric_dtype(df[col])
]
for col in target_cols:
    df.loc[:, col] = pd.to_numeric(df[col])

I'm sure there's a savvier way to get this down to one statement and avoid iterating over columns (applying a boolean mask instead), but this is readable and concise.

William Bradley
  • 355
  • 4
  • 10
  • Thanks a lot! This helps with improving my programming skills. However, it doesn't seem to solve the problem. All the columns are still objects. Is there something I'm missing/not understanding? – Caesar May 25 '21 at 01:20
  • It works fine for me, converting (in your example) initial_temp and final_temp to float64 dtype. – William Bradley May 25 '21 at 13:33