-2

I have the following data frame (called aa1):

d = {'Index' : pd.Series([25, 7, 7, 5, 5, 8])}
aa1 = pd.DataFrame(d)

I created a new column called Quartile_Index that scores the quartile for the Index column values (note: this was the best solution I could find since the pd.qcut library does not allow you to have duplicate bin edges):

quantile_frame = aa1.quantile(q=[0.25, 0.5, 0.75])
quantile_ranks = []
for index, row in aa1.iterrows():
    if (row['Index'] <= quantile_frame.ix[0.25]['Index']):
        quantile_ranks.append('q1')
    elif (row['Index'] >= quantile_frame.ix[0.25]['Index'] and row['Index'] <= quantile_frame.ix[0.5]['Index']):
        quantile_ranks.append('q2')
    elif (row['Index'] >= quantile_frame.ix[0.5]['Index'] and row['Index'] <= quantile_frame.ix[0.75]['Index']):
        quantile_ranks.append('q3')
    else:
        quantile_ranks.append('q4')

aa1['Quartile_Index'] = quantile_ranks

This is the best solution I can come up with, but I need help with two questions to further optimize my code/logic:

  1. Is there a simpler way trigger a quartile column with simpler code? pd.qcut does not allow you to have duplicate bin edges so that's not a solution.

  2. I need to produce the quantile column execution across 30 other pandas data frames (aa1, aa2, aa3, etc..). I don't want to repeat this 30 times so is there a solution out there? I'm not that advanced with loops just yet.

Ulises Sotomayor
  • 159
  • 2
  • 14
  • 1
    As an aside, the use of `.ix[]` has been depreciated in favor of `iloc` and `loc`. You can learn more about that here: https://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation-how-are-they-different – BenG Jul 03 '18 at 21:10
  • Thanks! Changed the code with answer below. – Ulises Sotomayor Jul 03 '18 at 21:36

1 Answers1

0

Solution for looping through dataframes (using iloc instead of .ix[]):

data = [aa1,aa2]
for i in data:
    quantile_frame = i.quantile(q=[0.25, 0.5, 0.75])
    quantile_ranks = []
    for index, row in i.iterrows():
        if (row['Index'] <= quantile_frame.iloc[0]['Index']):
            quantile_ranks.append('q1')
        elif (row['Index'] >= quantile_frame.iloc[0]['Index'] and row['Index'] <= quantile_frame.iloc[1]['Index']):
            quantile_ranks.append('q2')
        elif (row['Index'] >= quantile_frame.iloc[1]['Index'] and row['Index'] <= quantile_frame.iloc[2]['Index']):
            quantile_ranks.append('q3')
        else:
            quantile_ranks.append('q4')
    i['Quartile_Index'] = quantile_ranks
Ulises Sotomayor
  • 159
  • 2
  • 14