1

I would like to compute the histogram bin corresponding to the mean value given this dataframe:

      class          Area
0       1-10  1.675883e+06
1      11-20  1.026733e+06
2      21-30  6.102651e+05
3      31-40  6.281576e+05
4      41-50  4.710967e+05
5      51-60  4.236068e+05
6      61-70  4.015372e+05
7      71-80  3.619052e+05
8      81-90  3.386376e+05
9     91-100  3.333406e+05
10   101-110  2.423542e+05
11   111-120  2.388251e+05
12   121-130  1.440134e+05
13   131-140  1.849219e+05
14   141-150  5.982432e+06

So, the answer should be which class corresponds to the mean value. The mean value is based on the area of each class. Not sure how to proceed with this.

The answer will be the name of the class corresponding to the mean e.g. 31 - 40

Stefan
  • 41,759
  • 13
  • 76
  • 81
user308827
  • 21,227
  • 87
  • 254
  • 417
  • How come the answer is 31-40 if 11-20 is closer to the mean (or is that an arbitrary example)? Do you have an additional constraint that the class should be corresponding to, but smaller than the mean? – TNT May 17 '16 at 01:05
  • thanks @TNT, the answer was arbitrary since I did not know it. – user308827 May 17 '16 at 01:14
  • If that is what you are looking for it's fine with me, but don't use them arbitrarily please, since they do different things. Be aware that Stefan's solution will give you the insertion point (the last value which is lower than the mean); independent of if the lower or higher class is closer to the mean in absolute number; while my solution gives you the class with the closest deviation from the mean independent of direction of the deviation. – TNT May 18 '16 at 07:18

3 Answers3

1

If this is a histogram, it isn't unimodal. Are you sure you want the mean of the area value?

To see the problem, you can plot it:

import matplotlib.pyplot as plt
df.plot(x='class',y='area')
plt.axhline(df['area'].mean(),color='r')
df['area'].plot(kind='bar')

You can chose several different classes that are widely apart. If you are looking for the class that is closest to the mean value of the area (but Be sure you know what that means, datawise):

First calculate the absolute distance from the mean for each class, then chose the class with the same index as the minimal distance.:

df['dist']=abs(df['area']-df['area'].mean())
df['class'][df['dist'].idxmin()] # not considering multiple minima
df['class'][df['dist']==df['dist'].min()] #considering multiple minima

See here for plotting the bar red in the plot..

Community
  • 1
  • 1
TNT
  • 2,431
  • 1
  • 19
  • 26
1

You can use .searchsorted() to get the insert position. It is a bit odd though as distribution is indeed bimodal as noted and this relies on sorting the data.

df = df.sort_values('Area').reset_index(drop=True)
df.loc[df.Area.searchsorted(df.Area.mean()) - 1]

    class      Area
11  31-40  628157.6
Stefan
  • 41,759
  • 13
  • 76
  • 81
0

First read the dataframe as (df). Then you may find the mean of the area as:

import pandas as pd
import numpy as np

df = df.sort_values(['Area']) #<-- sort the dataframe depending on area value
mu = df['Area'].mean()


for i,j in df.iterrows():
 if j['Area'] < mu:
 cl = j['class']


print('mean lies in class')
print(cl)

Hope this works! Let me know.

Niladri Gomes
  • 213
  • 1
  • 2
  • 4