I have a Pandas dataframe called "df" with the following columns:
Income Income_Quantile Score_1 Score_2 Score_3
0 100000 5 75 75 100
1 70000 4 55 77 80
2 50000 3 66 50 60
3 12000 1 22 60 30
4 35000 2 61 50 53
5 30000 2 66 35 77
I also have a "for-loop" for selecting subsets of the dataframe using the "Income_Quantile" variable. The loop subsequently drops the "Income_Quantile" variable that was used to slice the main dataframe; "df".
Here is the code:
for level in df.Income_Quantile.unique():
df_s = df.loc[df.Income_Quantile == level].drop('Income_Quantile', 1)
Now, I want to calculate the spearman's rank correlation of the "Income" variable to the "Score_1", "Score_2" and "Score_3" variables in the "df_s".
I would also like to concatenate the results in a single frame, with the following structure:
Income Quantile Score_1 Score_2 Score_3
correlation …. …. …. ….
p-value …. …. …. ….
t-statistic …. …. …. ….
I think that the approach below, from a previous question I asked, could be helpful:
result = dict({key: correlations(val) for key, val in df_s.items()}) '''"correlations" will be a helper function for calculating the Spearman's rank correlation of each of the subsets to the "Income" variable and outputing the p-value and t-statistic of the test for each each variable.'''
But, I currently have no clues on how to effect the next steps.
Does anyone have any pointers on how I can get from where I currently am to where I want to be? This happens to be my weakest area in Python and I am stuck.