0

Assume we have a dataframe of 4 individuals' scores in 2 different tests and the 3rd column tells us if they passed or failed overall

df:

[10,20,failed
 10,40,passed
 20,40,passed
 30,10,failed]

I would like to generate a scatter plot with the scores of the 1st column on the x axis, the scores of the 2nd test on the y axis, and indicate with color (or marker) if they passed or failed. I have achieved this with:

plt.scatter(x=df[column1], y=df[column2], c=df[column3])

The question is, how can I have a legend based on the color (or marker) and column3?

[red: failed
 blue: passed]
Iri
  • 29
  • 1
  • 5
  • Thank you Sheldore for pointing out the possible duplication. Looking at bexi's solution though I feel it is worth keeping this question open. – Iri Jul 18 '19 at 07:46

1 Answers1

0

Here's my suggestion: Plot the failed an passed separately to get their handles, which can then be used for the legend.

fig = plt.figure()
ax1 = fig.add_subplot(111)

passed = ax1.scatter(x=df.loc[df[column3].eq('passed'), column1], y=df.loc[df[column3].eq('passed'), column2], c='green')
failed = ax1.scatter(x=df.loc[df[column3].eq('failed'), column1], y=df.loc[df[column3].eq('failed'), column2], c='red')

ax1.legend(handles=[passed, failed], labels=['Passed', 'Failed'])
bexi
  • 1,186
  • 5
  • 9