1

I have 3 users s1 who has 10 dollars, s2 10,20 dollars, and s3 20,20,30 dollars. I want to calculate percentage of users who had 10, 20 and 30 dollars. Is my interpretation correct here?

input

import pandas as pd
df1 = (pd.DataFrame({'users': ['s1', 's2', 's2', 's3', 's3', 's3'],
              'dollars': [10,10,20,20,20,30]}))

output

% of subjects who had 10 dollors        0.4
% of subjects who had 20 dollors        0.4
% of subjects who had 30 dollors        0.2

tried

df1.groupby(['dollars']).agg({'dollars': 'sum'}) / df1['dollars'].sum() * 100
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
ferrelwill
  • 771
  • 2
  • 8
  • 20
  • Can you explain how count first `0.4` ? – jezrael Feb 21 '23 at 06:18
  • I interpreted like this. Users who had 10 dollars are 2 (s1 and s2) and total users are 3. I divided 2/3 = 0.66. I repeated the same with other and took the sum (1.66). Finally divided 0.66 with 1.66 = 0.4. – ferrelwill Feb 21 '23 at 06:24
  • I solved it but I am doubtful if this is the right answer. ```df2 = df1.drop_duplicates().groupby(['dollars']).count() df2.reset_index() df3 = df2['users']/len(df2.index) df3 = df3.reset_index() df3['percentage'] = df3['users']/df3['users'].sum()``` – ferrelwill Feb 21 '23 at 06:37
  • Do you want `pd.crosstab(df1['users'], df1['dollars']).gt(0).mean().mul(100)`? This doesn't give the 0.4 though – mozway Feb 21 '23 at 06:39
  • @mozway yes, it gives the percentage and then I normalize this to the total. – ferrelwill Feb 21 '23 at 06:44

2 Answers2

3

Use DataFrameGroupBy.nunique for count unique users per dollars, divide number of unique dollars and last divide sum:

out = df1.groupby('dollars')['users'].nunique().div(df1['dollars'].nunique())
out = out / out.sum()

print (out)
dollars
10    0.4
20    0.4
30    0.2
Name: users, dtype: float64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

to get the percentage of users that have each kind of bill you can use a crosstab:

out = pd.crosstab(df1['users'], df1['dollars']).gt(0).mean().mul(100)

output:

dollars
10    66.666667
20    66.666667
30    33.333333
dtype: float64

If you want normalized counts:

out/out.sum()

Output:

dollars
10    0.4
20    0.4
30    0.2
dtype: float64
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    Note that you can also easily get the count for "at least N bills", "exactly 1 bill", "no bill", etc. by changing the condition (`gt(N)`/`eq(1)`/`eq(0)`) – mozway Feb 21 '23 at 07:00