0

I know we can use the following code to create a decile column for based on a column of given data set considering there are tie in the data (see How to qcut with non unique bin edges?):

import numpy as np
import pandas as pd

# create a sample
np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(100, 3), columns=list('ABC'))
# sort by column C
df = df.sort_values(['C'] , ascending = False )
# create decile by column C
df['decile'] = pd.qcut(df['C'].rank(method='first'), 10, labels=np.arange(10, 0, -1))

Is there an easy way to save the cut point from df then use the same cut point to cut a new data set? For example:

np.random.seed([1])
df_new = pd.DataFrame(np.random.rand(100, 1), columns=list('C'))
Gavin
  • 1,411
  • 5
  • 18
  • 31

1 Answers1

1

You can using .left get all bins

s1=pd.Series([1,2,3,4,5,6,7,8,9])
s2=pd.Series([2,3,4,6,1])

a=pd.qcut(s1,10).unique()

bins=[x.left for x in a ] + [np.inf]

pd.cut(s2,bins=bins)
BENY
  • 317,841
  • 20
  • 164
  • 234
  • But if I have tie in the data set how to do that. I have added example in the above by adding `.rank(method='first')` to the code to avoid issue from tie then how to use that to cut new data? – Gavin Oct 03 '18 at 02:32