It sounds like you have a single DataFrame with the following columns: stage, productivity, factor A, factor B, ..., factor X (your input) and you want a correlation matrix of the productivity and factors by stage, in particular to look at the correlation between each factor and productivity at each stage (your output). If I'm understanding correctly, then you could go about creating the tables in your image like this (say your data is in DataFrame df
, here I've used random values):
Correlation matrix by stage:
corr_mat_by_stage = df.groupby( "stage" ).corr()
With stages A and B and factors a-c this would look like this (the middle table in your image):
>>> corr_mat_by_stage.round( 2 )
factor a factor b factor c productivity
stage
A factor a 1.00 -0.28 0.81 -0.78
factor b -0.28 1.00 0.07 0.12
factor c 0.81 0.07 1.00 -0.39
productivity -0.78 0.12 -0.39 1.00
B factor a 1.00 0.45 0.59 0.52
factor b 0.45 1.00 0.82 0.80
factor c 0.59 0.82 1.00 0.75
productivity 0.52 0.80 0.75 1.00
Then, to get the more concise table that just has the productivity row/column for each stage, you can filter for that column and pivot to reshape so that you have the stages as columns like in your third table in the image:
prod_factor_corr_by_stage = corr_mat_by_stage["productivity"].reset_index().pivot( index="level_1", columns="stage", values="productivity" )
This would look like this (the right-most table in your image):
>>> prod_factor_corr_by_stage.round( 2 )
stage A B
level_1
factor a -0.78 0.52
factor b 0.12 0.80
factor c -0.39 0.75
productivity 1.00 1.00