I have the following data frame (called aa1):
d = {'Index' : pd.Series([25, 7, 7, 5, 5, 8])}
aa1 = pd.DataFrame(d)
I created a new column called Quartile_Index that scores the quartile for the Index column values (note: this was the best solution I could find since the pd.qcut library does not allow you to have duplicate bin edges):
quantile_frame = aa1.quantile(q=[0.25, 0.5, 0.75])
quantile_ranks = []
for index, row in aa1.iterrows():
if (row['Index'] <= quantile_frame.ix[0.25]['Index']):
quantile_ranks.append('q1')
elif (row['Index'] >= quantile_frame.ix[0.25]['Index'] and row['Index'] <= quantile_frame.ix[0.5]['Index']):
quantile_ranks.append('q2')
elif (row['Index'] >= quantile_frame.ix[0.5]['Index'] and row['Index'] <= quantile_frame.ix[0.75]['Index']):
quantile_ranks.append('q3')
else:
quantile_ranks.append('q4')
aa1['Quartile_Index'] = quantile_ranks
This is the best solution I can come up with, but I need help with two questions to further optimize my code/logic:
Is there a simpler way trigger a quartile column with simpler code? pd.qcut does not allow you to have duplicate bin edges so that's not a solution.
I need to produce the quantile column execution across 30 other pandas data frames (aa1, aa2, aa3, etc..). I don't want to repeat this 30 times so is there a solution out there? I'm not that advanced with loops just yet.