Plotly line chart with confidence interval using groupby

Question

I'd like to plot time series simulation data as a mean with confidence intervals and compare multiple scenarios. Using the pandas groupby() and agg() functions a calculate the mean and confidence interval (upper and lower limit) see sample data (the actual data can be retrieved from my github.

Edit: I added the raw data to the and the code (as a jupyter notebook) to the git

Plotting this data for one specific parameter combination (selecting data via df = df.loc[(slice(None),1, True)]) seams simple enough:

myFig = go.Figure([
go.Scatter(
    name='Mittelwert',
    #x=df['tick'],
    y=df['mean'],
    mode='lines',
    line=dict(color='rgb(31, 119, 255)'),
),
go.Scatter(
    name='Konfidenzintervall',
    #x=df['tick'],
    y=df['ci95_hi'],
    mode='lines',
    marker=dict(color="#644"),
    line=dict(width=0),
    showlegend=True
),
go.Scatter(
    name='Konfidenzintervall',
    #x=df['tick'],
    y=df['ci95_lo'],
    marker=dict(color="#448"),
    line=dict(width=0),
    mode='lines',
    fillcolor='rgba(130, 68, 68, 0.5)',
    fill='tonexty',
    showlegend=True
)
])
myFig.update_layout(
    xaxis_title='X Achse',
    yaxis_title='Y Achse',
    title='Continuous, variable value error bars',
    hovermode="x"
    )
myFig.show()

This code gives me the that beautiful plot. The issue is a do not know how to properly plot the grouped data. When I don't select a subset all data is plotted at once. Therefore i tried to use color, facet_col and facet_row which I could get working using

px.line(reset_df, x = "tick", y="mean", color="first_factor",facet_col="second_factor")

(Since plotly apparently can't handle MultiIndex Dataframe i used reset_index() to get a DataFrame with a 'RangeIndex' first). The issue is with the latter approach I'm now missing the confidence interval and don't know how to add it (c.f. this plot).

How can I have both the ci and the grouped data within one plot? If this is not possible with pandas is it with bokeh? Thank you very much

Does [Plotly: How to make a figure with multiple lines and shaded area for standard deviations?](https://stackoverflow.com/questions/61494278/plotly-how-to-make-a-figure-with-multiple-lines-and-shaded-area-for-standard-de/61501980#61501980) answer your question? Or at least produce a figure resembling your desired output? — vestland, Aug 25 '21 at 21:16
That definitely helps. Thank you! Is it possible to use a facet in addition to the traces in order to add more dimensions? Arranging the plots manually in a grid is quite inconvenient. — FrostyFrog, Aug 26 '21 at 07:20
I'm still not 100% sure how your data would look like, and what you're trying to achieve. Could you share a [sample](https://stackoverflow.com/questions/63163251/pandas-how-to-easily-share-a-sample-dataframe-using-df-to-dict/63163254#63163254) of your data and make your code snippet reproducible? — vestland, Aug 26 '21 at 07:56
I added the full data set and code to the linked github and specified the issue within the code. — FrostyFrog, Aug 26 '21 at 11:38

score 1 · Answer 1 · answered Aug 26 '21 at 21:03

( this is work in progress )

It's still not 100% clear to me how you'd like to display your data here. In your code sample:

px.line(reset_df, x = "tick", y="mean", color="first_factor",facet_col="second_factor")

... you're using first_factor and second_factor without saying what those are. And judging by the column names in your dataframe it could be almost any of:

['mean', 'median', 'count', 'std', 'min', 'max', 'ci95_lo', 'ci95_hi']

..or your index:

'(1.0, 0.0, False)'

But since it seems clear that you'd like to display a confidence interval around a mean across some categories, I'm guessing that this is what you're looking for:

Plot 1:

Plot 2: Zoomed in on first plot

The way to get there is full of pit-falls, and I'm not going to waste anyone's time talking about the details if this is not what you're looking for. But if it is, I'd be happy to explain everything.

Complete code:

df = pd.read_json("https://raw.githubusercontent.com/thefeinkoster/plotly_issue/main/aggregatedData.json")
df = df.reset_index()
df = pd.concat([df, df['index'].str.split(',', expand = True)], axis = 1)
df = df.rename(columns = {0:'ix', 1:'scenario', 'ci95_lo':'lo', 'ci95_hi':'hi'})

df.ix = [int(float(i[1:])) for i in df.ix]
df = df[df[2]==' True)']

dfp=pd.melt(df, id_vars=['ix', 'scenario'], value_vars=df.columns[-12:])
dfp = dfp[dfp['variable'].isin(['ix', 'mean', 'lo', 'hi'])] 

dfp = dfp.sort_values(['ix', 'variable'], ascending = False)

fig = px.line(dfp, x = 'ix', y = 'value', color = 'variable', facet_row = 'scenario')
fig.update_layout(height = 800)

fig.update_traces(name = 'interval', selector = dict(name = 'hi'))
fig.update_traces(fill = 'tonexty')
fig.update_traces(fillcolor = 'rgba(0,0,0,0)', selector = dict(name = 'mean'))
fig.update_traces(fillcolor = 'rgba(0,0,0,0)', line_color = 'rgba(44, 160, 44, 0.5)',
                  showlegend = False, selector = dict(name = 'lo'))

fig.update_yaxes(range=[0.44, 0.54])
f = fig.full_figure_for_development(warn=False)
fig.show()

sorry for the still unclear question. As a clarification on first and second factor, these are actually the second and third index (the first index being the tick). Your example is definitely close to what I'd like to achieve. But additionally to the "scenario" of [0.0 -1] I'd like to plot the data for the "second_factor" independently -> i. e. e. g. first_factor = 0.0 -> second_factor = yes / no. Also I would like to have multiple lines "within one plot" e. g. the first_factor as plots in a grid and the second_factor as lines within one chart. — FrostyFrog, Aug 27 '21 at 12:59
Also I'd like to add that I've seam to have found a working "manual" solution on my own. As soon as it is in a proper / readable manner I will share it with you. My solution is also rather complicated since I'm adding each trace manually in a loop. An optimization to my solution therefore would be if this could be achieved using some kind of existing library. I will get back to you. Thank you very much for your effort! — FrostyFrog, Aug 27 '21 at 13:01

Plotly line chart with confidence interval using groupby

1 Answers1

Plot 1:

Plot 2: Zoomed in on first plot

Complete code: