0

I have a data frame in which all the data in columns are of type object. Now I want to convert all objects into numeric types using astype() function but I don't want to do something like this ->

df.astype({'col1': 'int32' , 'col2' : 'int32' ....})

If I do something like this ->

enter image description here

I get an error because apply function needs Series to traverse.

PS: The other option of doing the same thing is ->

df.apply(pd.to_numeric)

But I want to do this using .astype() Is there any other way instead of using df.apply() and still convert all object type data into numeric using df.astype()

3 Answers3

1

Use df = df.astype(int) to convert all columns to int datatype

import numpy

df.astype(numpy.int32)
bigbounty
  • 16,526
  • 5
  • 37
  • 65
1

If these are object columns and you're certain they can be "soft-casted" to int, you have two options:

df
  worker day    tasks
0      A   2     read
1      A   9    write
2      B   1     read
3      B   2    write
4      B   4  execute

df.dtypes

worker    object
day       object
tasks     object
dtype: object

pandas <= 0.25

infer_objects (0.21+ only) casts your data to numpy types if possible.

df.infer_objects().dtypes

worker    object
day        int64
tasks     object
dtype: object

pandas >= 1.0

convert_dtypes casts your data to the most specific pandas extension dtype if possible.

df.convert_dtypes().dtypes

worker    string
day        Int64
tasks     string
dtype: object

Also see this answer by me for more information on "hard" versus "soft" conversions.

cs95
  • 379,657
  • 97
  • 704
  • 746
1

In my opinion the safest is to use pd.to_numeric in your apply function which also allows you error manipulation, coerce, raise or ignore. After getting the columns to numeric, then you can safely perform your astype() operation, but I wouldn't suggest it to begin with:

df.apply(pd.to_numeric, errors='ignore')

If the column can't be converted to numeric, it will remain unchanged

df.apply(pd.to_numeric, errors='coerce')

The columns will be converted to numeric, the values that can't be converted to numeric in the column will be replaced with NaN.

df.apply(pd.to_numeric, errors='raise')

ValueError will be returned if the column can't be converted to numeric

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53