5

I did a scatter plot using seaborn from three columns ['Category','Installs' and 'Gross Income'] and a hue map using the category column from my dataset. However in the legend, other than the category column which I want to appear, there is a big smug at the end showing one of the columns used in the scatter plot, Installs. I'll like to remove this element, but from searching through other questions hear and the documentation of seaborn and matplotlib I'm at a loss on how to proceed.

Here is a snippet of the code I'm working with:

fig, ax = pyplot.subplots(figsize=(12,6))

ax=sns.scatterplot( x="Installs", y="Gross Income", data=comp_income_inst, hue='Category', 
                   palette=sns.color_palette("cubehelix",len(comp_income_inst)), 
                   size='Installs', sizes=(100,5000), legend='brief', ax=ax) 

ax.set(xscale="log", yscale="log")
ax.set(ylabel="Average Income") 
ax.set_title("Distribution showing the Earnings of Apps in Various Categories\n", fontsize=18)
plt.rcParams["axes.labelsize"] = 15



# Move the legend to an empty part of the plot
plt.legend(loc='upper left', bbox_to_anchor=(-0.2, -0.06),fancybox=True, shadow=True, ncol=5)
#plt.legend(loc='upper left')

plt.show()

This is the result of the code above, notice the smug in the Legend on the lower right corner

Ali AzG
  • 1,861
  • 2
  • 18
  • 28
Daniel1234
  • 75
  • 2
  • 8
  • Possible duplicate of [Seaborn: title and subtitle placement](https://stackoverflow.com/questions/52914441/seaborn-title-and-subtitle-placement) – Diziet Asahi Nov 22 '18 at 20:45

1 Answers1

6

Actually, that is not a smudge but the size legend for your hue map. Because the bubble sizes (100, 5000) are so large relative to data, they overlap in that space in legend, creating the "smudge" effect. The default legend combines both color and size legends together.

But rather than remove the size markers as you intend, readers may need to know the range Installs size for bubbles. Hence, consider separating one legend into two legends and use borderpad and prop size to fit the bubbles and labels.

Data (seeded, random data)

categs = ['GAME', 'EDUCATION', 'FAMILY', 'WEATHER', 'ENTERTAINMENT', 'PHOTOGRAPHY', 'LIFESTYLE',
          'SPORTS', 'PRODUCTIVITY', 'COMMUNICATION', 'PERSONALIZATION', 'HEALTH_AND_FITNESS', 'FOOD_AND_DRINK', 'PARENTING',
          'MAPS_AND_NAVIGATION', 'TOOLS', 'VIDEO_PLAYERS', 'BUSINESS', 'AUTO_AND_VEHICLES', 'TRAVEL_AND_LOCAL',
          'FINANCE', 'MEDICAL', 'ART_AND_DESIGN', 'SHOPPING', 'NEWS_AND_MAGAZINES', 'SOCIAL', 'DATING', 'BOOKS_AND REFERENCES',
          'LIBRARIES_AND_DEMO', 'EVENTS']

np.random.seed(11222018)
comp_income_inst = pd.DataFrame({'Category': categs,
                                 'Installs': np.random.randint(100, 5000, 30),
                                 'Gross Income': np.random.uniform(0, 30, 30) * 100000
                                }, columns=['Category', 'Installs', 'Gross Income'])

Graph

fig, ax = plt.subplots(figsize=(13,6))

ax = sns.scatterplot(x="Installs", y="Gross Income", data=comp_income_inst, hue='Category', 
                    palette=sns.color_palette("cubehelix",len(comp_income_inst)), 
                    size='Installs', sizes=(100, 5000), legend='brief', ax=ax) 

ax.set(xscale="log", yscale="log")
ax.set(ylabel="Average Income") 
ax.set_title("Distribution showing the Earnings of Apps in Various Categories\n", fontsize=20)
plt.rcParams["axes.labelsize"] = 15

# EXTRACT CURRENT HANDLES AND LABELS
h,l = ax.get_legend_handles_labels()

# COLOR LEGEND (FIRST 30 ITEMS)
col_lgd = plt.legend(h[:30], l[:30], loc='upper left', 
                     bbox_to_anchor=(-0.05, -0.50), fancybox=True, shadow=True, ncol=5)

# SIZE LEGEND (LAST 5 ITEMS)
size_lgd = plt.legend(h[-5:], l[-5:], loc='lower center', borderpad=1.6, prop={'size': 20},
                      bbox_to_anchor=(0.5,-0.45), fancybox=True, shadow=True, ncol=5)

# ADD FORMER (OVERWRITTEN BY LATTER)
plt.gca().add_artist(col_lgd)

plt.show()

Output

Two Legend Plot Output

Even consider seaborn's theme with sns.set() just before plotting:

Seaborn Plot Output

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks a mill @Parfait!!! This looks way much better than what I intended to do. You are quite right that it is helpful to maintain the installs legend, it gives perspective to the graph. Thanks again. However when I ran your code it only showed the installs legend and left out the category legend. Also on my system it still shows triple dots unlike the more elegant single color dot in your output. what can I do about that as well? – Daniel1234 Nov 23 '18 at 09:33
  • Weirdly enough when I switched the order by declaring `size_lgd` first and then `col_lgd` the `plt.gca().add_artist(size_lgd)` produced the legends. Though it would still be helpful to know how I can change my output from using 3 colored circles to one like yours. All the same thanks for the help. I really appreciate it. – Daniel1234 Nov 23 '18 at 09:53
  • Hmmm...I wonder if we are having version issues. I just upgraded both matplotlib and seaborn: `pip install --upgrade modulename` and this solution is reproduced. Also, it could be a data issue. Can you reproduce the legend and color issue with my posted data example? If so, try posting a few [sample rows](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) of your actual data. – Parfait Nov 23 '18 at 15:57
  • I think it had to do more with some default settings on my system. I was able to trace the settings that needed adjustment and I was able to reproduce the appearance of your output. Both in the `size_lgd` and `col_lgd` I added `markerscale=0.3, scatterpoints=1` and it changed the number of appearing scatterpoints from three to one. Thanks again you were really a big help. – Daniel1234 Nov 24 '18 at 22:23