I am using the classic kaggle house price dataset. I want to plot each of the feature columns (bedrooms
, bathrooms
, sqft_living
etc.) against the price
target to check for any correlation, but I want to have say 3 or 4 plots on each row to make the plot more compact.
What I have done so far:
import pandas as pd
import matplotlib.pylab as plt
import seaborn as sns;
sns.set_context('poster')
sns.set_style('darkgrid')
df = pd.read_csv('kc_house_data.csv')
cols = [i for i in list(df.columns) if i not in ['id','price']]
for col in cols:
fig, ax = plt.subplots(figsize=(12,8))
df.plot(kind='scatter', x=col, y='price', ax=ax, s=10, alpha=0.5)
plt.show()
So, I am using the inbuilt pandas plotting functionality, but this plots each figure on a new line.
What I would like to have is something like the pandas scatter matrix plot where multiple (in this case 4) plots appear on one line. (I don't need to plot the distribution along the diagonal as shown below).
How can I have multiple plots in one row using the pandas scatter_matrix
functionality or some other python plotting functionality?
Nice to haves:
Labeled axes
Correlation between each feature and
price
displayed on each plot