7

I'm trying to make a grouped boxplot using Seaborn (Reference), and the boxes are all incredibly narrow -- too narrow to see the grouping colors.

g = seaborn.factorplot("project_code",y="num_mutations",hue="organ",
        data=grouped_donor, kind="box", aspect=3)

enter image description here

If I zoom in, or stretch the graphic several times the width of my screen, I can see the boxes, but obviously this isn't useful as a standard graphic.

This appears to be a function of my amount of data; if I plot only the first 500 points (of 6000), I get visible-but-small boxes. It might specifically be a function of the high variance of my data; according to the matplotlib boxplot documentation,

The default [width] is 0.5, or 0.15x(distance between extreme positions) if that is smaller.

Regardless of the reason, there's plenty of room on the graph itself for wider boxes, if I could just widen them.

Unfortunately, the boxplot keyword widths which controls the box width isn't a valid factorplot keyword, and I can't find a matplotlib function that'll change the width of a bar or box outside of the plotting function itself. I can't even find anyone discussing this; the closest I found was boxplot line width. Any suggestions?

mwaskom
  • 46,693
  • 16
  • 125
  • 127
Lanthala
  • 91
  • 1
  • 5
  • Can you link to the plot you're seeing? Seaborn boxplots take up about as much horizontal space as they could so I'm not sure what the problem could be. – mwaskom Jun 26 '15 at 16:01
  • Also if you can't share your actual data please try to share some code that will generate random data that reproduces the problem; doing so might also give you insight into what the issue is. – mwaskom Jun 26 '15 at 16:02
  • I can't post pictures, but I have [a screenshot of it here](https://www.dropbox.com/s/2hzf0yta4cp4kxg/bad_grouped_boxplot.png?dl=0). And a pickled dataframe that creates that plot when run with the code in my question can be downloaded from [my dropbox](https://www.dropbox.com/s/pg3vtkuu28gfyiq/grouped_boxplot_data.p?dl=0). – Lanthala Jun 26 '15 at 19:54
  • 2
    It looks like the hue levels are perfectly nested within the x variable, I think that is your problem. Just remove `hue="organ"`. – mwaskom Jun 26 '15 at 20:03
  • Also, the above screenshot was taken after running plt.yscale('log') to rescale the axis. – Lanthala Jun 26 '15 at 20:08
  • 1
    You're right, removing hue="organ" made all the boxes expand to fill the available width! Does this mean there's no way to use factorplot to color-code my projects by organ? – Lanthala Jun 26 '15 at 20:14
  • If you pass a color palette name to the `palette` keyword argument it will color the `x` variable. – mwaskom Jun 26 '15 at 20:20
  • Unfortunately, in this case color-coding by X won't help me, because each organ is associated with several projects. I was hoping to use grouped boxplots to make it clear which project is from which organ, but it looks like no matter which way I group things (either hue=organ or hue=project_id), the boxes end up too thin. Thank you for your help though! – Lanthala Jun 26 '15 at 20:31
  • ...wait, I think I see what you mean. I can hard-code a "palette" which colors the projects by organ, and pass it into factorplot. Tedious, but it'll work! Thank you! – Lanthala Jun 26 '15 at 20:44
  • 1
    `palette = df["organ"].map(pal_dict)` where pal_dict has organs as keys and colors as values should do the trick. – mwaskom Jun 26 '15 at 22:17
  • That did, in fact, do the trick! I added a legend using the code from the last answer [here](http://stackoverflow.com/questions/26558816/matplotlib-scatter-plot-with-legend), and everything's exactly how I imagined it :) – Lanthala Jun 26 '15 at 23:40
  • Would you mind elaborating on how you added a legend? I am having the same problem with seaborn boxplot. I solved it with the solution in this post (removing 'hue'), but I cannot seem to add a legend... – Nicole Goebel Mar 23 '16 at 21:50

2 Answers2

4

When sns.boxplot is used adding dodge=False will solve this problem as of version 0.9.

sns.factorplot() has been deprecated since version 0.9, and has been replaced with catplot() which also has the dodge parameter.

ilyas
  • 609
  • 9
  • 25
2

For future reference, here are the relevant bits of code that make the correct figure with legend: (obviously this is missing important things and won't actually run as-is, but hopefully it shows the tricky parts)

import matplotlib.pylab as pyp
import seaborn as sns

def custom_legend(colors,labels, legend_location = 'upper left', legend_boundary = (1,1)):
    # Create custom legend for colors
    recs = []
    for i in range(0,len(colors)):
        recs.append(mpatches.Rectangle((0,0),1,1,fc=colors[i]))
    pyp.legend(recs,labels,loc=legend_location, bbox_to_anchor=legend_boundary)

# Color boxplots by organ
organ_list = sorted(df_unique(grouped_samples,'type'))
colors = sns.color_palette("Paired", len(organ_list))
color_dict = dict(zip(organ_list, colors))
organ_palette = grouped_samples.drop_duplicates('id')['type'].map(color_dict)

# Plot grouped boxplot
g = sns.factorplot("id","num_mutations",data=grouped_samples, order=id_list, kind="box", size=7, aspect=3, palette=organ_palette)
sns.despine(left=True)
plot_setup_pre()
pyp.yscale('log')
custom_legend(colors,organ_list)    
Lanthala
  • 91
  • 1
  • 5