Python Pandas : get average number of decimal

Question

I used panda to generate a dataframe with several rows and columns.

I am now trying to determine the average number of decimals for each column. For example :

 A     B      C 
 10.1 22.541 21.44 
 10.2 23.548 19.4
 11.2 26.547 15.45

The program would return 1 for A, 3 for B and 2 for C

Would you have an effective method to do this, given that the dataframe I'm handling has about 16000 lines.

Thank you

Can you give some examples to make it clear what you mean by "number of decimals"? It's ambiguous as it stands. — Mark Dickinson, Nov 11 '19 at 20:14
Please provide some sample input along with the desired output. — Cleb, Nov 11 '19 at 20:15
for example the program should return 2 for 2.98 and 1 for 2.1, and do a mean of theses values for the column :) — AlexJJ, Nov 11 '19 at 20:16
Computing the number of decimals after the point is tricky: it's not a particularly well-defined notion, thanks to the use of binary floating-point. See [this excellent answer](https://stackoverflow.com/a/17838332/270986) from Keith Thompson on the subject. (It's about C, but the principle is the same: Python uses the same floating-point format.) — Mark Dickinson, Nov 11 '19 at 20:22
Why are you trying to do this? It seems like a really odd and impractical idea. — AMC, Nov 11 '19 at 21:36
I've been asked to include this in a program so I do it , odd idea or not ;P — AlexJJ, Nov 12 '19 at 20:10

Alex · Accepted Answer · 2019-11-11T21:00:43.963

1

Updated code

Ok, here it is. May be little bit complicated ;)

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [10.1, 10.2, 11.2] ,'B': [22.541, 23.548, 26.547],'C':[21.44,19.4,15.45]})
df

Out[1]:
       A    B       C
0   10.1    22.541  21.44
1   10.2    23.548  19.4
2   11.2    26.547  15.45


[sum((df[col].astype(str).str.split('.', expand=True)[1]).apply(lambda x: len(str(x))))/len((df[col].astype(str).str.split('.', expand=True)[1]).apply(lambda x: len(str(x)))) for col in df.columns]

Out[2]:
[1.0, 3.0, 1.6666666666666667]

step by step realization

df1 = pd.DataFrame([(df[col].astype(str).str.split('.', expand=True)[1]).apply(lambda x: len(str(x))).values for col in df.columns]).T
df1

Out[3]:
    0   1   2
0   1   3   2
1   1   3   1
2   1   3   2

df1.mean()

Out[4]:
0    1.000000
1    3.000000
2    1.666667
dtype: float64

edited Nov 11 '19 at 21:00

answered Nov 11 '19 at 20:14

Alex

1,118
7
7

Sorry I wasn't very clear I ask to count the numbers after the comma and average this:) – AlexJJ Nov 11 '19 at 20:24
could you show expression, please, how you get 1 for 'A', using the numbers 10.1, 10.2, 11.2 ? – Alex Nov 11 '19 at 20:28
for 10.1 it return 1, for 10.11 it return 2 for 10.112 it return 3. It's the number of decimals, number of caracters after the comma :) – AlexJJ Nov 11 '19 at 20:31
i've tried for my example dataframe and it work, but for my real data frame it raise an error "KeyError: 1" – AlexJJ Nov 11 '19 at 20:59
i just a bit simplified it, see my update. And what kind of error it gives you? – Alex Nov 11 '19 at 21:02
could you show the output for "df.columns" and "df.to_dict()"? – Alex Nov 11 '19 at 21:04
By testing i think it's because one of my columns is a boolean column with True and False. I'm trying to change df by a slice of wanted columns (without this column) . :) – AlexJJ Nov 11 '19 at 21:07
right. if even one column is not float type (has no '.' char) it would not be splitted on two elements.and can't be called as [1].. so it gives you exactly key error. Just for testing you able to do like this df = df.drop('name_of_boolean_coumn', axis = 1) – Alex Nov 11 '19 at 21:10
It work if i copy list(df.columns) and remove the not wanted columns. Thanks very much ! Also would you know if you can return 0 if the only decimal value is a 0? (that value=int(value) – AlexJJ Nov 11 '19 at 21:18

Python Pandas : get average number of decimal

1 Answers1

Updated code

step by step realization