2

I have a relatively simple issue, but cannot find any answer online that addresses it. Starting from a simple boxplot:

import plotly.express as px
 
df = px.data.iris()

fig = px.box(
    df, x='species', y='sepal_length'
)

val_counts = df['species'].value_counts()

I would now like to add val_counts (in this dataset, 50 for each species) to the plots, preferably on either of the following places:

  • On top of the median line
  • On top of the max/min line
  • Inside the hoverbox

How can I achieve this?

JDDS
  • 79
  • 2
  • 9
  • There is already an answer to this question, but it is an example of customizing a hover template. I understand that your wish is to annotate the boxplot with the count number. Is my understanding correct? If so, I will answer. – r-beginners Nov 08 '21 at 13:36
  • This is indeed correct. While I did not ask for it, a solution that takes into account annotations for grouped boxplots would be very useful to me still (and maybe to others). Based on this Github issue here, there are a few solutions (and an official fix), but only for bar charts: https://github.com/plotly/plotly.py/issues/356 – JDDS Nov 08 '21 at 16:33
  • 1
    You said you didn't need an answer, but after reading the comments in the accepted answer, I thought my example might be useful to you. You can use a loop to add the necessary amount of objects as they are added. If you check [Colab](https://colab.research.google.com/drive/1X5-TmcIWzv_BrwuahscDkEV0kjiBacC8?usp=sharing), it will remove your comment. – r-beginners Nov 09 '21 at 09:04

2 Answers2

3

The snippet below will set count = 50 for all unique values of df['species'] on top of the max line using fig.add_annotation like this:

for s in df.species.unique():
    fig.add_annotation(x=s,
                       y = df[df['species']==s]['sepal_length'].max(),
                       text = str(len(df[df['species']==s]['species'])),
                       yshift = 10,
                       showarrow = False
                      )

Plot:

enter image description here

Complete code:

import plotly.express as px
 
df = px.data.iris()

fig = px.box(
    df, x='species', y='sepal_length'
)

for s in df.species.unique():
    fig.add_annotation(x=s,
                       y = df[df['species']==s]['sepal_length'].max(),
                       text = str(len(df[df['species']==s]['species'])),
                       yshift = 10,
                       showarrow = False
                      )
f = fig.full_figure_for_development(warn=False)
fig.show()
vestland
  • 55,229
  • 37
  • 187
  • 305
  • 1
    Thanks a lot! Quick follow-up question: How would you update the solution for a grouped boxplot (i.e. for 'setosa', if it was split into two boxplots for 'setosa_1' and 'setosa_2', etc.). Is there a way to properly identify the x-axis position in this scenario that as well)? – JDDS Nov 08 '21 at 14:57
2

Using same approach that I presented in this answer: Change Plotly Boxplot Hover Data

  • calculate all the measures a box plot calculates plus the additional measure you want count
  • overlay bar traces over box plot traces so hover has all measures required
import plotly.express as px

df = px.data.iris()

# summarize data as per same dimensions as boxplot
df2 = df.groupby("species").agg(
    **{
        m
        if isinstance(m, str)
        else m[0]: ("sepal_length", m if isinstance(m, str) else m[1])
        for m in [
            "max",
            ("q75", lambda s: s.quantile(0.75)),
            "median",
            ("q25", lambda s: s.quantile(0.25)),
            "min",
            "count",
        ]
    }
).reset_index().assign(y=lambda d: d["max"] - d["min"])

# overlay bar over boxplot
px.bar(
    df2,
    x="species",
    y="y",
    base="min",
    hover_data={c:not c in ["y","species"] for c in df2.columns},
    hover_name="species",
).update_traces(opacity=0.1).add_traces(px.box(df, x="species", y="sepal_length").data)

enter image description here

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • Thanks for your solution. It is not exactly what I was looking for, as the leftover opacity is a bit confusing for a publication, but I will keep it in mind nonetheless as a possibility. – JDDS Nov 08 '21 at 14:37
  • one thing you can do is reduce impact of bar by reducing it's height. I've made it cover whole box plot, it can be use a **base** just below median and **y** that places it just above median. happy to share if it helps – Rob Raymond Nov 08 '21 at 15:08