0

I have a sample df

A B
X 30
Y 150
Z 450
XX 300

I need to create another column C that buckets column B based on some breakpoints

Breakpts = [50,100,250,350]

A B C
X 30 '0-50'
Y 150 '100-250'
Z 450 '>350'
XX 300 '250-350'

I have the following code that works

def conditions(i): 
    if i <=50: return '0-50'
    if i > 50 and i <=100: return '50-100'
    if i > 100 and i <=250: return '100-250'
    if i > 250 and i <=350: return '250-350'
    if i > 350: return '>350'

df['C']=df['B'].apply(conditions)

However I would like to make the breakpts dymanic. So if I use a different breakpts like [100,250,300,400] the code should automatically create different buckets based on the breakpts.

Any ideas on how to do this?

Lenormju
  • 4,078
  • 2
  • 8
  • 22
sudhasethu
  • 53
  • 1
  • 7
  • See the first answer [here](https://stackoverflow.com/questions/45273731/binning-column-with-python-pandas/45273750) and use `pd.cut()`. – BLimitless Jun 07 '21 at 17:46
  • Maybe I misread. By dynamic, do you mean that when you change the bins, the labels should adjust automatically? – sophocles Jun 08 '21 at 16:09

1 Answers1

2

As pointed in the comments, pd.cut() would be the way to go. You can make the breakups dynamic and set them yourself:

import pandas as pd
import numpy as np

bins = [0,50, 100,250, 350, np.inf]
labels = ["'0-50'","'50-100'","'100-250'","'250-350'","'>350'"]
df['C'] = pd.cut(df['B'], bins=bins, labels=labels)

Have a look also at pandas.qcut which is a quantiles based discretization function.


Alternatively, with np.select:

col = 'B'
conditions = [
              df[col].between(0,50),   # inclusive = True is the default
              df[col].between(50,100),  
              df[col].between(100,250),
              df[col].between(250,350),
              df[col].ge(350)
             ]
choices = ["'0-50'","'50-100'","'100-250'","'250-350'","'>350'"]
    
df["C"] = np.select(conditions, choices, default=np.nan)

Both print:

    A    B          C
0   X   30     '0-50'
1   Y  150  '100-250'
2   Z  450     '>350'
3  XX  300  '250-350'
sophocles
  • 13,593
  • 3
  • 14
  • 33