1

I have three lists that I have loaded into a pandas dataframe.

import pandas as pd
df = pd.DataFrame({'x': location})
df = df.assign(y1 = variable1)
df = df.assign(y2 = variable2)

I would like to plot the correlation of y1 with y2 with x being the common x-axis. That is, really, I would like to bin y1 and y2 values according to x location, find the correlation of y1 with y2 within each bin and then plot a line of the correlations across the whole x domain. So my final plot will have correlation on the y-axis and location on the x-axis.

I have previously done something not completely dissimilar to this using the scipy binned_statistics function to plot conditional means but I don't think I can easily extend that to correlations. I would also like to get a bit better at using pandas anyway so I'm trying to avoid that route if at all possible.

I'm sure this has been asked before but everything that I have come across seems to be looking at multiple distribution plots.

1 Answers1

0

I've more or less arrived at a solution. Implementing something similar to what was used here I have:

nbins = 20
df['bins'] = pd.qcut(df['x'], q=nbins)
plotdatadf = df.groupby('bins')[['y1', 'y2']].corr().iloc[0::2, -1]

This provides me with a data frame with a correlation coefficient of y1 and y2 for each bin, where bins are evenly divided along x in terms of observations per bin.

I can now go back to my previous dataframe and add another column of the original length with these correlation values, conditional on if bin[1] then corr = corr[1]-type copying. This column can then be plotted as y against my already existing x as a line plot.