2

I need to identify which columns in a dataframe are decimals and which are strings.
Using df.dtypes gives 'object' for both column types:

import pandas as pd
import decimal 

data = {'dec1': [1.1, 1.2],'str1': ["a","b"]}
df = pd.DataFrame(data)

df.dec1 = df.dec1.apply(lambda x: decimal.Decimal(x))

df.dtypes

enter image description here

I am using the following code to know which are decimals, but there has to be a more pythonic way for something so basic. What is it?

actual_col_types = df.iloc[0].apply(type)

df_decimals = df.loc[:,actual_col_types==decimal.Decimal]

enter image description here

Julien Massardier
  • 1,326
  • 1
  • 11
  • 29

2 Answers2

4

Use isinstance, what should be more preferable like type, link:

mask = df.iloc[0].map(lambda x: isinstance(x, decimal.Decimal))
df_decimals = df.loc[:,mask]
print (df_decimals)
                                                dec1
0  1.10000000000000008881784197001252323389053344...
1  1.19999999999999995559107901499373838305473327...
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Building of jezrael's answer, if your data has mixed types in a column, and the first row has something like NaN, then this will fail to detect columns that have the desired type.

Instead, we can check for the presence of any decimal value for each column like so:

from decimal import Decimal
...

decimal_columns = df.apply(lambda x: x.apply(lambda y: isinstance(y, Decimal)).any())
df_decimals = df.loc[:, decimal_columns]

Jon
  • 952
  • 1
  • 11
  • 17