Consider the following toy data:
import pandas as pd
import numpy as np
from plotly import graph_objects as go
from plotly.subplots import make_subplots
np.random.seed(42)
df = pd.DataFrame(
{
"val1": np.random.normal(0, 1, size=100),
"val2": np.random.normal(5, 2, size=100),
"cat": np.random.choice(["a", "b"], size=100),
}
)
which yields (top 5 rows):
val1 | val2 | cat | |
---|---|---|---|
0 | 0.496714 | 2.16926 | b |
1 | -0.138264 | 4.15871 | b |
2 | 0.647689 | 4.31457 | a |
3 | 1.52303 | 3.39545 | b |
4 | -0.234153 | 4.67743 | a |
My objective is to get two box plots each containing two boxes (one per category).
Following code:
fig = make_subplots(rows=2, cols=1, subplot_titles=["Value 1 dist", "Value 2 dist"])
fill_colors = {"a": "rgba(150, 25, 40, 0.5)", "b": "rgba(25, 150, 40, 0.5)"}
for i, val in enumerate(["val1", "val2"]):
for c in df["cat"].unique():
dff = df[df["cat"] == c]
fig.add_trace(
go.Box(
y=dff[val],
x=dff["cat"],
boxmean="sd",
name=c,
showlegend=True if val=="val1" else False,
fillcolor=fill_colors[c],
line={"color": fill_colors[c]},
),
row=i + 1,
col=1,
)
Brings me very close:
Here are the things I would like to adjust:
- How do I get, programmatically, the first 2 (or
n
) colors used in the default cycle of Plotly? So the result is compatible with other plots. Note that I hardcoded the colors... - The legend on the left; is there a more programmatic way to have only single legend? Note that I used
showlegend=True if val=="val1" else False
. - Bonus: How can I control the order of the boxes (i.e. which category comes first?)
I posted in the past two related questions (here and here) but the answers there didn't help me tune me plot as I want.