I'm using scatter_matrix for correlation visualization and calculating correlation values using corr(). Is it possible to have the scatter_matrix visualization draw the regression line in the scatter plots?
-
check the edit section of the most voted answer in this question: http://stackoverflow.com/questions/8154511/drawing-a-correlation-graph-in-matplotlib – Nikos Tavoularis Oct 14 '16 at 07:05
-
Thanks Nikos. In this application I am specifically trying to provide the visualization using the scatter_matrix package. I am generating 100,000's of plots and it gets a little simpler if I can combine some of them on one view. I'm also worried about speed as, for example, it takes 1 minute to do 1500 correlations but then takes 12 minutes if I add plots with them. – chemnteach Oct 15 '16 at 15:23
1 Answers
I think this is a misleading question/thought process.
If you think of data in strictly 2 dimension then a regression line on a scatter plot makes sense. But let's say you have 5 dimensions of data you are plotting in your scatter matrix. In this case the regression for each pair of dimensions is not an accurate representation of the global regression.
I would be wary presenting that to anyone as I can easily see where it could create confusion.
That being said if you don't care about a regression across all of your dimensions then you could write your own function to do this. A quick walk through of steps may be: 1. Identify number of dimensions N 2. Create figure 3. Double for loop on N, first will walk down rows, second will walk across rows 4. At each point add subplot, calculate regression (if not kde/hist position), plot scatter cloud and regression line or kde/hist

- 15,553
- 7
- 65
- 85