I'm working in a Jupyter notebook and trying to generate scatter plots (with best fit lines) of two variables for several values of a third categorical variable (around 350). Here's my code so far:
nrows = 20
ncols = 20
fig = plt.figure(figsize=(20,20), dpi=100)
for i, third_variable_value in enumerate(df.third_variable.unique()):
ax = fig.add_subplot(nrows, ncols, i + 1)
sns.regplot(data=df[df.third_variable == third_variable_value], x='first_variable', y='second_variable', ax=ax)
fig.tight_layout()
fig.savefig('test.jpg', dpi=400)
I've read through questions here and here about using an increased DPI as matplotlib
requires the figure to fit to screen; and tight_layout()
to prevent overlapping.
However, the saved figure has font that's too big in relation to the plot area, and the axes scale to become smaller but the points remain the same size. Is there a way to just shrink each subplot as a whole? Ideally, I'd like to keep the same aspect ratio for the plot area as a single figure and have the plot area to font size ratio be fixed.
These are two sample views of the saved file.
I know I can do it using something like sns.lmplot(data=df, x='first_variable', y='second_variable', col='third_variable', sharey=False, sharex=False, col_wrap=30)
but I'd like to do it in the above manner as I eventually want to color each point by values of a fourth variable but with only one best fit line, and will be replacing the sns.regplot()
with a matplotlib
function.