13

I have a dataframe, for example:

1
1.3
2,5
4
5

With the following code, I am trying to know what are the types of the different cells of my pandas dataframe:

for i in range(len(data.columns)):
    print(" lenth of  columns : " + str(len(data.columns)))
    for j in range(len(data[i])):
        data[i][j] = re.sub(r'(\d*)\.(\d*)', r'\1,\2', str(data[i][j]))
        print(data[i][j])

        print(" est de type : "type(data[i][j]))
        if str(data[i][j]).isdigit():
            print(str(data[i][j]) + " contain a number  ")

The problem is when a cell of the dataframe contain a dot, pandas thinks it is a string. So I used regex, in order to change the dot into a comma.

But after that, the types of all my dataframe cells changed to string. My question is: How can I know if a cell of the dataframe is an int or a float? I already tried isinstance(x, int)

edit: How can I count the number of int and float, with the output of the df.apply(type) for example, I want to know how many cells of my column are int or float

My second question is, why when I have 2.5, the dataframe give him the str type?

0       <class 'int'>
1       <class 'str'>
2     <class 'float'>
3     <class 'float'>
4       <class 'int'>
5       <class 'str'>
6       <class 'str'>
wjandrea
  • 28,235
  • 9
  • 60
  • 81
John Smith
  • 169
  • 1
  • 1
  • 6
  • 3
    ... `df['col_name'].dtype`? – roganjosh Apr 19 '18 at 17:29
  • 1
    Welcome to SO. Please provide a **[mcve]**. Also see: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – jpp Apr 19 '18 at 17:30
  • Possible duplicate of [Determining Pandas Column DataType](https://stackoverflow.com/questions/41262370/determining-pandas-column-datatype) – RCA Apr 19 '18 at 17:31
  • 1
    A column won't have mixed dtypes, it will default to some `object` type if it's mixed – roganjosh Apr 19 '18 at 17:31

2 Answers2

20

If you have a column with different types, e.g.

>>> df = pd.DataFrame({"c": [1, "a", 10.43, [1,3,4]]})
>>> df
           c
0          1
1          a
2      10.43
4  [1, 3, 4]

Pandas will just state that this Series is of dtype object. However, you can get each entry type by simply applying type function

>>> df['c'].apply(type)
0     <type 'int'>
1     <type 'str'>
2     <type 'float'>
4     <type 'list'>

However, if you have a dataset with very different data types, you probably should reconsider its design.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • Thanks, I want to know why , in my panda dataframe, the number 2.5 , which have a dot , is a ? – John Smith Apr 19 '18 at 19:22
  • 1
    `2.5` is a float; `"2.5"` is a str. Pandas read your file as str by default, you'd have to convert all digits to float/integers manually. Pandas usually infer types correctly. but it is hard to do it with mixed types. – rafaelc Apr 19 '18 at 19:29
  • 1
    Good answer, but I believe the OP was asking for an entire dataframe and not just a single column. If I do `df.apply(type)` for instance, it will print out the type of each column and not of each cell. – demongolem Jun 18 '20 at 15:56
  • 6
    @demongolem if you want each cell, use `df.applymap(type)` – rafaelc Jun 18 '20 at 17:55
2

To add to @rafaelc's answer, it's possible to find integers and floats with:

>>> d = pd.DataFrame({'a': [1, 2., '3'], 'b': [4, 5, 6.]})

>>> d.applymap(pd.api.types.is_integer)
       a      b
0   True  False
1  False  False
2  False  False

>>> d.applymap(pd.api.types.is_float)
       a     b
0  False  True
1   True  True
2  False  True

(Notice the automatic upcasting in the second column.)

You can then use .sum() for counting them: d.applymap(pd.api.types.is_float).sum()

To my knowledge, there is no pd.api.types-method for finding strings.

wojciech
  • 100
  • 5