1

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle) https://i.stack.imgur.com/hS2rT.png

Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)

violin_data = remove_na(group_data[hue_mask])

I realized that this happens when there are too many nans

Is there any possibility to prevent this collapsing by code only I do not want to modify my dataframe (replace the nans by zero)

Below you find my code:

boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)

The output is a per cancer type differently sized plot (depending on if there is any category completely nan) I am expecting each plot to be in the same width.

Update trying to use the order parameter as suggested leads to the following output: https://i.stack.imgur.com/4wtRa.png

Maybe this toy example helps ?

|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93|    |0.52|    |6.01
|3.34|    |0.89|    |2.89
|3.39|    |1.96|    |4.63
|1.59|    |3.66|    |3.75
|2.73|    |0.39|    |2.87
|0.08|    |1.25|    |-0.27

Update Apparently, the problem is not the data but the length of the title https://github.com/matplotlib/matplotlib/issues/4413

Therefore I would close the question @Diziet should I delete it or does my issue might help other ones? Sorry for not including the line below in the code example:

ax.set_title("VERY LONG TITLE", fontsize=20)
Ivo Leist
  • 408
  • 3
  • 12
  • 1
    I'm not entirely clear how your code could have generated the figure you show at the begining. According to your code, you should always get a 10x10 figure, regardless of the content of your dataframe(s) – Diziet Asahi Jul 30 '19 at 13:46
  • Ah good catch this might be confusing for others as well I screenshotted the two plots and uploaded them as one figure eliminating as much white space as possible I am going to upload another one – Ivo Leist Jul 30 '19 at 13:56
  • Your toy dataset and code does not reproduce the issue. Please review [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) – Diziet Asahi Jul 30 '19 at 19:58
  • @Diziet was trying to reproduce the issue in the toy dataset as well...there I realized that the issue is not the data but the plot title (see update). Anyway thank you for pushing me to provide a toy example – Ivo Leist Jul 30 '19 at 23:01

1 Answers1

0

It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.

for instance:

tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

enter image description here

Diziet Asahi
  • 38,379
  • 7
  • 60
  • 75
  • Thank you for your first thought, but unfortunately, this did not solve my problem (see updated question). When I find the time I will provide some example datasets. Where would be a good place to upload/host them? – Ivo Leist Jul 30 '19 at 13:37
  • If the datasets are very large, it would be better to create a "toy" dataset (i.e. using things like `np.random.random()` or `np.random.normal()` that create a dataframe with the same general shape and that reproduce the problem you're facing. See [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) and [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Diziet Asahi Jul 30 '19 at 13:44
  • I am about to create a toy dataset - stay tuned – Ivo Leist Jul 30 '19 at 13:47