7

I have been stuck on this for a while and no amount of googling seems to help.

I am reading in a lot of raw data. Some of the variables come in as objects due to the source using letters for various reasons for missing (which I do not care about).

So I want to run a fairly large subset of columns through pandas.to_numeric(___ ,error='coerce') just to force these to be cast as int or float (again, I do not care too much which, just that they are numeric.

I can make this happen column by column easy:

df['col_name'] = pd.to_numeric(df['col_name'], errors='coerce') 

However, I have some 60 columns I want to cast like this .. so I thought this would work:

numeric = ['lots', 'a', 'columns']
for item in numeric:
    df_[item] = pd.to_numeric(df[item], errors='coerce')

The error I get is:

Traceback (most recent call last):

File "/Users/____/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "<ipython-input-53-43b873fbd712>", line 2, in <module>
df_detail[item] = pd.to_numeric(dfl[item], errors='coerce')

File "/Users/____/anaconda/lib/python2.7/site-packages/pandas/tools/util.py", line 101, in to_numeric
raise TypeError('arg must be a list, tuple, 1-d array, or Series')

TypeError: arg must be a list, tuple, 1-d array, or Series

I tried many versions. This is has something to do with the list or looking through it. I get the very same error when the for-loop simply calls for df(item).describe()

From my (still novice) understanding of Python, this should work. I am at loss. Thanks

dozyaustin
  • 611
  • 1
  • 6
  • 20
  • Have a look at [`applymap`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.applymap.html) and be sure to give meaningful return values (ie give back the original value if it could not be converted). – Jan Oct 01 '16 at 05:32

3 Answers3

8

First of all, see this answer

# Let
numeric = ['lots', 'a', 'columns']

Option 1

df[numeric] = df[numeric].apply(pd.to_numeric, errors='coerce')

Option 2

df.loc[:, numeric] = pd.to_numeric(df[numeric].values.ravel(), 'coerce') \
                       .reshape(-1, len(numeric))

Demonstration
Consider the dataframe df

df = pd.DataFrame([
        [1, 'a', 2],
        ['b', 3, 'c'],
        ['4', 'd', '5']
    ], columns=['A', 'B', 'C'])

Then both options above yield

enter image description here

Community
  • 1
  • 1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

How 'bout this :

df = df.apply( pd.to_numeric, errors='coerce' )
SebMa
  • 4,037
  • 29
  • 39
0

It get two data frame, First Actual data and df_data_type is contain features and its type

def check_change_data_type(df, df_data_type):
        for i in range(0,len(df_data_type)):
            #print(df_data_type.iloc[i][0])
        #print(df_data_type.iloc[i][0],"Type",df_data_type.iloc[i][1])
            for col in df.columns:
                #print(col)
                if df_data_type.iloc[i][0] == col:
                    if not df_data_type.iloc[i][1] == df[col].dtype.kind:
                        print("Data Type is not equal", col, df[col].dtype.kind,df_data_type.iloc[i][1])
                        if df_data_type.iloc[i][1] == 'f':
                            df[col] = df[col].str.replace('[^A-Za-z0-9\s]+', '')
                            df[col] = pd.to_numeric(df[col], errors = 'coerce')
                            #df[col] = df[col].apply(pd.to_numeric, errors='coerce')
                            #df.loc[:,col] = df.loc[:,df.columns.get_loc(col)].apply(''.join).str.replace('[^A-Za-z0-9\s]+', '') 
                            #df[col] = pd.to_numeric(df[col], errors = 'coerce') 
                        elif df_data_type.iloc[i][1] == 'i' and df[col].dtype.kind != 'f':
                            df[col] = df[col].str.replace('[^A-Za-z0-9\s]+', '')
                            df[col] = pd.to_numeric(df[col], errors = 'coerce')
                        elif df_data_type.iloc[i][1] == 'i' and df[col].dtype.kind == 'f':
                            df[col] = pd.to_numeric(df[col], errors = 'coerce')
                            #df[col] = df[col].apply(pd.to_numeric, errors='coerce')
                            #df.loc[:,col] = df.loc[:,df.columns.get_loc(col)].apply(''.join).str.replace('[^A-Za-z0-9\s]+', '') 
                            #df[col] = pd.to_numeric(df[col], errors = 'coerce')
                        #elif df_data_type.iloc[i][1] == 'O':
                    #else: continue
                    else: break        
        
        return df
Community
  • 1
  • 1
kamran kausar
  • 4,117
  • 1
  • 23
  • 17