Seaborn showing values in legend not present in Pandas column

Question

I'm generating a scatterplot for a Pandas DataFrame data, containing amongst others the numeric column 'year' with the unique values

array([2010., 2011., 2012., 2013., 2014., 2015., 2016., 2017., 2018.])

as shown with data.year.unique().

Displaying the plot like this:

ax = sns.scatterplot(x='x', y='y', hue='name', size='year', data=data, palette=sns.color_palette('deep', 7))

generates a legend with the groupings for year listed as

5 size markers with labels 2008 to 2020

This is misleading, as the plot only contains data from 2010 to 2018.

I've tried passing a tuple (min, max) to the sns.scatterplot function as described in the documentation, to no avail.

Changing the data type of the column 'year' to categoric does print the range of the years correctly in the legend, but yields a legend entry for every single year, which is unnecessary and takes up a lot of space.

I've also tried the solution from this related thread, but it doesn't change the range of the legend entries.

How can I force seaborn to show the actual range of values in the legend? Alternatively, if it only works by using a categorical column, how can I only show every second entry in the legend?

[Edit seaborn legend](https://stackoverflow.com/questions/45201514/edit-seaborn-legend)　I hope this answer is helpful. — r-beginners, May 22 '20 at 06:56
Thanks! Indeed the second approach in the answer worked for me. Still wondering though why this problem occurs in the first place and the scale_order=(min, max) approach doesn't work. — Texfy, May 22 '20 at 07:44

Seaborn showing values in legend not present in Pandas column

0 Answers0