1

I am using the classic kaggle house price dataset. I want to plot each of the feature columns (bedrooms, bathrooms, sqft_living etc.) against the price target to check for any correlation, but I want to have say 3 or 4 plots on each row to make the plot more compact.

What I have done so far:

import pandas as pd
import matplotlib.pylab as plt
import seaborn as sns;
sns.set_context('poster')
sns.set_style('darkgrid')

df = pd.read_csv('kc_house_data.csv')
cols = [i for i in list(df.columns) if i not in ['id','price']]
for col in cols:
    fig, ax = plt.subplots(figsize=(12,8))
    df.plot(kind='scatter', x=col, y='price', ax=ax, s=10, alpha=0.5)
    plt.show()

So, I am using the inbuilt pandas plotting functionality, but this plots each figure on a new line.

What I would like to have is something like the pandas scatter matrix plot where multiple (in this case 4) plots appear on one line. (I don't need to plot the distribution along the diagonal as shown below).

enter image description here

How can I have multiple plots in one row using the pandas scatter_matrix functionality or some other python plotting functionality?

Nice to haves:

  1. Labeled axes

  2. Correlation between each feature and price displayed on each plot

PyRsquared
  • 6,970
  • 11
  • 50
  • 86

1 Answers1

0

Don't create a new subplot every iteration. Instead, create one subplot with multiple columns and put each plot on its own axis with the ax parameter to pd.plot():

fig, axes = plt.subplots(1, len(cols), figsize=(12,8), squeeze=False)

for i, col in enumerate(cols):
    df.plot(kind='scatter', x=col, y='price', ax=axes[0, i], s=10, alpha=0.5)

plt.show()
Peter
  • 628
  • 7
  • 12