17

I have a DataFrame with some columns. I'd like to add a new column where each row value is the quantile rank of one existing column.

I can use DataFrame.rank to rank a column, but then I don't know how to get the quantile number of this ranked value and to add this quantile number as a new colunm.

Example: if this is my DataFrame

df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=['a', 'b'])

   a    b
0  1    1
1  2   10
2  3  100
3  4  100

and I'd like to know the quantile number (using 2 quantiles) of column b. I'd expect this result:

   a    b  quantile
0  1    1    1
1  2   10    1
2  3  100    2
3  4  100    2
feetwet
  • 3,248
  • 7
  • 46
  • 84
luca
  • 7,178
  • 7
  • 41
  • 55
  • 1
    I think you want DataFrame.[quantile](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.quantile.html) with q=[0, 0.25, 0.5, 0.75, 1]... Then you should be able to [rank](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rank.html) on that column – jeyoor Jul 13 '16 at 15:52
  • Ah, that makes sense. Thanks! – luca Jul 13 '16 at 15:59
  • You're welcome, posting comment above as answer below. – jeyoor Jul 13 '16 at 16:01
  • I am trying to apply your suggestion but It seems I cannot do it – luca Jul 13 '16 at 16:44
  • It looks like getting the resulting dataframe from quantile back into the original dataframe requires a few extra operations... Working on an edit – jeyoor Jul 13 '16 at 19:25
  • New edit with example code... how does it work for you? – jeyoor Jul 13 '16 at 19:49

4 Answers4

26

I discovered it is quite easy:

df['quantile'] = pd.qcut(df['b'], 2, labels=False)

   a    b  quantile
0  1    1         0
1  2   10         0
2  3  100         1
3  4  100         1

Interesting to know "difference between pandas.qcut and pandas.cut"

Community
  • 1
  • 1
luca
  • 7,178
  • 7
  • 41
  • 55
  • 3
    This generally throws a `SettingWithCopyWarning`, at least in 0.16.1 (which I'm still using). – feetwet Dec 26 '16 at 20:37
  • 1
    @feetweet, thanks for reporting the issue with old pandas version (<= 0.16.1) and for the alternative solution. Anyway for versions >= 0.18 it works without any warning. – luca Dec 28 '16 at 09:12
  • 1
    to avoid `SettingWithCopyWarning`, use `df.loc[:, "quantile"] = ...` – jkr Nov 30 '21 at 18:28
2

df['quantile'] = pd.qcut(df['b'], 2, labels=False) seems to tend to throw a SettingWithCopyWarning.

The only general way I have found of doing this without complaints is like:

quantiles = pd.qcut(df['b'], 2, labels=False)
df = df.assign(quantile=quantiles.values)

This will assign the quantile rank values as a new DataFrame column df['quantile'].

A solution for a more generalized case, in which one wants to partition the cut by multiple columns, is given here.

Community
  • 1
  • 1
feetwet
  • 3,248
  • 7
  • 46
  • 84
1

You can use DataFrame.quantile with q=[0.25, 0.5, 0.75] on the existing column to produce a quartile column.

Then, you can DataFrame.rank on that quartile column.

See below for an example of adding a quartile column:

import pandas as pd

d = {'one' : pd.Series([40., 45., 50., 55, 60, 65], index=['val1', 'val2', 'val3', 'val4', 'val5', 'val6'])}
df = pd.DataFrame(d)

quantile_frame = df.quantile(q=[0.25, 0.5, 0.75])
quantile_ranks = []
for index, row in df.iterrows():
    if (row['one'] <= quantile_frame.ix[0.25]['one']):
        quantile_ranks.append(1)
    elif (row['one'] > quantile_frame.ix[0.25]['one'] and row['one'] <= quantile_frame.ix[0.5]['one']):
        quantile_ranks.append(2)
    elif (row['one'] > quantile_frame.ix[0.5]['one'] and row['one'] <= quantile_frame.ix[0.75]['one']):
        quantile_ranks.append(3)
    else:
        quantile_ranks.append(4)

df['quartile'] = quantile_ranks

Note: There's probably a more idiomatic way to accomplish this with Pandas... but it's beyond me

jeyoor
  • 938
  • 1
  • 11
  • 20
0
df.sort_values(['b'],inplace = True)
df.reset_index(inplace = True,drop = True)
df.reset_index(inplace = True)
df.rename(columns = {'index':'row_num'},inplace = True)
df['quantile'] = df['row_num'].apply(lambda x: math.ceil(10*(x+1)/df.shape[0]))

I used to use this, but I guess I can use quantile