0

I'm reading the book pandas for eveyone. In chapter 3, the author creates a scatter plot using the following code:

# create a color variable based on sex
def recode_sex(sex):
    if sex == 'Female':
        return 0
    else:
        return 1

tips['sex_color'] = tips['sex'].apply(recode_sex)

scatter_plot = plt.figure(figsize=(20, 10))
axes1 = scatter_plot.add_subplot(1, 1, 1)
axes1.scatter(
    x=tips['total_bill'],
    y=tips['tip'],

    # set the size of the dots based on party size
    # we multiply the values by 10 to make the points bigger
#     and to emphasize the differences
    s=tips['size'] * 90,

#     set the color for the sex
    c=tips['sex_color'],

    # set the alpha value so points are more transparent
    # this helps with overlapping points
    alpha=0.5
)

axes1.set_title('Total Bill vs Tip Colored by Sex and Sized by Size')
axes1.set_xlabel('Total Bill')
axes1.set_ylabel('Tip')

plt.show()

The plot looks like this:

enter image description here

My question is how can I add a legend to the scatter plot?

Cody
  • 2,480
  • 5
  • 31
  • 62
  • `scatter_plot.legend()` Can you try this – r-beginners Oct 14 '20 at 06:28
  • 3
    Does this answer your question? [Matplotlib scatter plot legend](https://stackoverflow.com/questions/17411940/matplotlib-scatter-plot-legend) – funie200 Oct 14 '20 at 06:28
  • @funie200, I'm just starting out and none of this makes, can you give me an example? – Cody Oct 14 '20 at 06:39
  • @r-beginners I'm gettting `No handles with labels found to put in legend` error. – Cody Oct 14 '20 at 06:51
  • @Cody Eventhough you're question has been answered in the meantime, I suggest you start with a basic matplotlib tutorial. Like [the official tutorials](https://matplotlib.org/tutorials/). – funie200 Oct 14 '20 at 07:22

1 Answers1

3

Here is a solution. This code is based on the Matplotlib's tutorial on scatter plot with legends. Looping of the dataset grouped by gender allows to generate a color per gender (and corresponding legend). The size is then indicated from the output of the scatter function, using legend_elements for the sizes.

This is what I obtain with the dataset used in your example:

marker on scatter plot

Here is the code:

import matplotlib.pyplot as plt
import seaborn as sns

# Read and group by gender
tips = sns.load_dataset("tips")
grouped = tips.groupby("sex")

# Show per group
fig, ax = plt.subplots(1)
for i, (name, group) in enumerate(grouped):
    sc = ax.scatter(
        group["total_bill"],
        group["tip"],
        s=group["size"] * 20,
        alpha=0.5,
        label=name,
    )

# Add legends (one for gender, other for size)
ax.add_artist(ax.legend(title='Gender'))
ax.legend(*sc.legend_elements("sizes", num=6), loc="lower left", title="Size")
ax.set_title("Scatter with legend")

plt.show()
Leonard
  • 2,510
  • 18
  • 37
  • I am just starting out with visualization, can you provide a pure matplotlb solution? – Cody Oct 14 '20 at 06:55
  • Do you mean, without grouping by gender? Because this is quite a pure Matplotlib solution as it is directly adapter from a Matplotlib tutorial. – Leonard Oct 14 '20 at 06:56
  • 1
    Note that I imported `seaborn` just to load the data; it is not involved at all in the plotting solution. – Leonard Oct 14 '20 at 07:01