0

I am looking for a way to make n (e.g. 20) groups in a dataframe by a specific column by percentile. (data type is float). I am not sure if the group by quantile function can take care of this, and if it can, how the code should look like.

There are 3 rows a, b, c

i.e. Data are sorted by column 'a', and make 20 groups

  • Group 1 = 0 to 5 percentile
  • Group 2 = 5 to 10 percentile
    .
    .
    .
  • Group 20 = 95 to 100 percentile.

would there also be a way to find the mean a, b, and c of each group, and sort them into another dataframe?

ROK_Dusty
  • 1
  • 1
  • Use `pandas.qcut`. This should give you 20 quantile. `pd.qcut(df['percentile'], 20)` – XXavier Jul 20 '20 at 21:36
  • Thanks a lot. Have added additional edits for a separate question. It's my first day on StackOverflow and pandas so the questions might sound very stupid. – ROK_Dusty Jul 20 '20 at 21:50
  • 2
    @ROK_Dusty please read up on how to create a sample dataframe that shows input and also one for expected output. More info here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Jul 20 '20 at 21:52
  • Welcome to Stack Overflow. Please take the [tour](https://stackoverflow.com/tour), read about [what's on-topic](https://stackoverflow.com/help/on-topic), and read [How to Ask a Good Question](https://stackoverflow.com/help/how-to-ask). – And don't forget to accept an answer (check-mark next to an answer) when it answers your question. In this way your question won't show up as unanswered anymore. – Ivo Mori Jul 21 '20 at 00:51

2 Answers2

1

You can create 20 equal size bins using this. df['newcol'] = pd.qcut(df.a,np.linspace(.05, 1, 19, 0), duplicates='drop') Then you can groupby the newcol to find the summary stats of a,b and c columns df.groupby(['newcol']).mean()

XXavier
  • 1,206
  • 1
  • 10
  • 14
0
# group by percentile
profitdf['quantile_a'] = pd.qcut(profitdf['a'], 20)
profitdf['quantile_b'] = pd.qcut(profitdf['b'], 20)
quantile_a = profitdf.groupby(['quantile_a']).mean()
quantile_b = profitdf.groupby(['quantile_b']).mean()

Solved. Thank you everyone.

ROK_Dusty
  • 1
  • 1
  • Answering your own question is a great way of sharing your findings and (newly gained) knowledge; still please have a look at [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer) for reference and possibly on how to improve your current answer further. – Ivo Mori Jul 21 '20 at 00:56