0

I would like to create a scatter plot in matplotlib from the following data:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'year': [2008, 2014, 2019, 2019.25, 2019.5, 2020],
                   'y': [2, 3, 8, 12, 63, 71],
                   'total_students': [800, 1000, 4000, 4500, 11000, 37000],
                   'male_students': [600, 700, 2000, 2100, 5100, 27000]})

I want to plot year (unevenly spaced years) on the X-axis and y on the y-axis. I would like the marker size to show the two extra variables (total number of students and male students) by overlaying two dots, sized accordingly, per data point. I would therefore like the radius of the markers to scale in a logical way, sothat the viewer can estimate the proportion of male to female students.

I've tried a few things and had the following problems:

  • When changing the size in the scatter command, the scaling is not as expected (i.e. s=10 doesn't have twice the radius of s=5).
  • I've tried adding circles instead with both plt.Circle() and the circle function explained in this post: https://stackoverflow.com/a/24567352/5895788 . However, they require me to set aspect equal, which collapses my X axis completely. I don't quite understand why this happens - maybe because of my time scale data? Not setting the aspect equal or doing ax.set_aspect(1./ax.get_data_ratio()) causes the circles to become elliptical.

I'm not sure if this has any value, but here are some examples of what I tried:

## Option 1: scatter with s parameter
scaling=0.05
plt.scatter(df.year,df.y,marker="o",zorder=1,c="red",s=df.total_students*scaling)
plt.scatter(df.year,df.y,marker="o",zorder=2,c="blue",s=df.male_students*scaling)
plt.show()


## Option 2: Trying to add circles for the first data point without equal aspect ratio
fig=plt.figure()
ax=fig.add_subplot(111)
plt.ylim((-5,100))
plt.xlim((2007,2021))

scaling=0.01
ax.add_patch(plt.Circle((2008,2),800*scaling))
plt.show()


## Option 3: Trying to set equal aspect ratio
fig=plt.figure()
ax=fig.add_subplot(111,aspect="equal")
plt.ylim((-5,100))
plt.xlim((2007,2021))

scaling=0.01
ax.add_patch(plt.Circle((2008,2),800*scaling))
plt.show()
Jer Sto
  • 73
  • 2
  • 9
  • 1
    Note that the `s` parameter is an area. So, `s=100` has double the radius as `s=25`. However, setting the aspect ratio depending on your data ratio should work. You should take care that the data ratio is calculated after the `xlim`/`ylim` are set (and after the last call to `plt.scatter()`). – JohanC Nov 25 '22 at 19:41
  • I see, thank you! The area comment helped a lot, and I was able to manually adjust the scaling accordingly. – Jer Sto Nov 27 '22 at 15:59

0 Answers0