I'm trying to make some basic plots so I can better understand what is happening in my data. Currently 1 have 4 variables each with 200*387 data points. I've stored everything in a 3D array, with the 3rd dimension representing different variables associated with the data.
Currently I have produced some scatterplots of var1 vs. var2. However, i would like to add a conditional mean curve on top of this scatterplot. This would be the average var1 (y-axis) value for any given var2 (x-axis) value. However, I am quite new to Python and so am pretty sure that the way I am currently thinking of approaching this is by a long way not the most efficient.
What I'm thinking at the moment is that I can vectorise the data for each variable (i.e. make it 1D) and then create bins of var2 of some reasonable size and then find the average of var1 for each of these bins. I store these averages in some new vector and then plot that.
Is this a very stupid way of doing this? From what I've searched it seems like pandas may have a simple way of doing this but given how new to Python I am I'm also not sure if going straight to pandas would be overkill.
Thank you in advance for any and all responses!