How to make a fixed number of groups by percentile from a dataframe in pandas

Question

I am looking for a way to make n (e.g. 20) groups in a dataframe by a specific column by percentile. (data type is float). I am not sure if the group by quantile function can take care of this, and if it can, how the code should look like.

There are 3 rows a, b, c

i.e. Data are sorted by column 'a', and make 20 groups

Group 1 = 0 to 5 percentile
Group 2 = 5 to 10 percentile
.
.
.
Group 20 = 95 to 100 percentile.

would there also be a way to find the mean a, b, and c of each group, and sort them into another dataframe?

Use `pandas.qcut`. This should give you 20 quantile. `pd.qcut(df['percentile'], 20)` — XXavier, Jul 20 '20 at 21:36
Thanks a lot. Have added additional edits for a separate question. It's my first day on StackOverflow and pandas so the questions might sound very stupid. — ROK_Dusty, Jul 20 '20 at 21:50
@ROK_Dusty please read up on how to create a sample dataframe that shows input and also one for expected output. More info here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — David Erickson, Jul 20 '20 at 21:52
Welcome to Stack Overflow. Please take the [tour](https://stackoverflow.com/tour), read about [what's on-topic](https://stackoverflow.com/help/on-topic), and read [How to Ask a Good Question](https://stackoverflow.com/help/how-to-ask). – And don't forget to accept an answer (check-mark next to an answer) when it answers your question. In this way your question won't show up as unanswered anymore. — Ivo Mori, Jul 21 '20 at 00:51

score 1 · Answer 1 · answered Jul 20 '20 at 22:54

1

You can create 20 equal size bins using this. df['newcol'] = pd.qcut(df.a,np.linspace(.05, 1, 19, 0), duplicates='drop') Then you can groupby the newcol to find the summary stats of a,b and c columns df.groupby(['newcol']).mean()

answered Jul 20 '20 at 22:54

XXavier

1,206
1
10
14

@ROK_Dusty don't use comments to say _thanks_ – instead up-vote helpful and good answers. – Ivo Mori Jul 21 '20 at 00:58

score 0 · Answer 2 · answered Jul 20 '20 at 22:55

0

# group by percentile
profitdf['quantile_a'] = pd.qcut(profitdf['a'], 20)
profitdf['quantile_b'] = pd.qcut(profitdf['b'], 20)
quantile_a = profitdf.groupby(['quantile_a']).mean()
quantile_b = profitdf.groupby(['quantile_b']).mean()

Solved. Thank you everyone.

answered Jul 20 '20 at 22:55

ROK_Dusty

1
1

Answering your own question is a great way of sharing your findings and (newly gained) knowledge; still please have a look at [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer) for reference and possibly on how to improve your current answer further. – Ivo Mori Jul 21 '20 at 00:56

How to make a fixed number of groups by percentile from a dataframe in pandas

2 Answers2