2

I am trying to plot three different timeseries dataframes (each around 60000 records) using plotly, while highlighting weekends (and workhours) with a different background color.

Is there a way to do it without looping through the whole dataset as mentioned in this solution. While this method might work, the performance can be poor on large datasets

vestland
  • 55,229
  • 37
  • 187
  • 305
Abdelrahman Shoman
  • 2,882
  • 7
  • 36
  • 61
  • Check `fig.layout['shapes']` in the answer given as an example question to understand the structure. You can do this by creating as many of these dictionary settings as you need and setting them as follows. `fig.layout['shapes']=[...]` However, I don't know how much it will improve the speed because I think it will be created by looping or using comprehension notation. – r-beginners Sep 13 '21 at 13:38

3 Answers3

4

I would consider using make_subplots and attach a go.Scatter trace to the secondary y-axis to act as a background color instead of shapes to indicate weekends.

Essential code elements:

fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(go.Scatter(x=df['date'], y=df.weekend,
                         fill = 'tonexty', fillcolor = 'rgba(99, 110, 250, 0.2)',
                         line_shape = 'hv', line_color = 'rgba(0,0,0,0)',
                         showlegend = False
                        ),
              row = 1, col = 1, secondary_y=True)

Plot:

enter image description here

Complete code:

import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import datetime
from plotly.subplots import make_subplots

pd.set_option('display.max_rows', None)

# data sample
cols = ['signal']
nperiods = 50
np.random.seed(2)
df = pd.DataFrame(np.random.randint(-1, 2, size=(nperiods, len(cols))),
                  columns=cols)
datelist = pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=nperiods).tolist()
df['date'] = datelist 
df = df.set_index(['date'])
df.index = pd.to_datetime(df.index)
df.iloc[0] = 0
df = df.cumsum().reset_index()
df['signal'] = df['signal'] + 100
df['weekend'] = np.where((df.date.dt.weekday == 5) | (df.date.dt.weekday == 6), 1, 0 )

fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(go.Scatter(x=df['date'], y=df.weekend,
                         fill = 'tonexty', fillcolor = 'rgba(99, 110, 250, 0.2)',
                         line_shape = 'hv', line_color = 'rgba(0,0,0,0)',
                         showlegend = False
                        ),
              row = 1, col = 1, secondary_y=True)

fig.update_xaxes(showgrid=False)#, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
fig.update_layout(yaxis2_range=[-0,0.1], yaxis2_showgrid=False,  yaxis2_tickfont_color = 'rgba(0,0,0,0)')
fig.add_trace(go.Scatter(x=df['date'], y = df.signal, line_color = 'blue'), secondary_y = False)

fig.show()

Speed tests:

For nperiods = 2000 in the code snippet below on my system, %%timeit returns:

162 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The approach in my original suggestion using fig.add_shape() is considerably slower:

49.2 s ± 2.18 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
vestland
  • 55,229
  • 37
  • 187
  • 305
2

You could use a filled area chart to highlight all weekends at once without using a loop and without creating multiple shapes, see the code below for an example.

import pandas as pd
import numpy as np
import plotly.graph_objects as go

# generate a time series
df = pd.DataFrame({
    'date': pd.date_range(start='2021-01-01', periods=18, freq='D'),
    'value': 100 * np.cumsum(np.random.normal(loc=0.01, scale=0.005, size=18))
})

# define the y-axis limits
ymin, ymax = df['value'].min() - 5, df['value'].max() + 5

# create an auxiliary time series for highlighting the weekends, equal
# to "ymax" on Saturday and Sunday, and to "ymin" on the other days
df['weekend'] = np.where(df['date'].dt.day_name().isin(['Saturday', 'Sunday']), ymax, ymin)

# define the figure layout
layout = dict(
    plot_bgcolor='white',
    paper_bgcolor='white',
    margin=dict(t=5, b=5, l=5, r=5, pad=0),
    yaxis=dict(
        range=[ymin, ymax],  # fix the y-axis limits
        tickfont=dict(size=6),
        linecolor='#000000',
        color='#000000',
        showgrid=False,
        mirror=True
    ),
    xaxis=dict(
        type='date',
        tickformat='%d-%b-%Y (%a)',
        tickfont=dict(size=6),
        nticks=20,
        linecolor='#000000',
        color='#000000',
        ticks='outside',
        mirror=True
    ),
)

# add the figure traces
data = []

# plot the weekends as a filled area chart
data.append(
    go.Scatter(
        x=df['date'],
        y=df['weekend'],
        fill='tonext',
        fillcolor='#d9d9d9',
        mode='lines',
        line=dict(width=0, shape='hvh'),
        showlegend=False,
        hoverinfo=None,
    )
)

# plot the time series as a line chart
data.append(
    go.Scatter(
        x=df['date'],
        y=df['value'],
        mode='lines+markers',
        marker=dict(size=4, color='#cc503e'),
        line=dict(width=1, color='#cc503e'),
        showlegend=False,
    )
)

# create the figure
fig = go.Figure(data=data, layout=layout)

# save the figure
fig.write_image('figure.png', scale=2, width=500, height=300)

enter image description here

Flavia Giammarino
  • 7,987
  • 11
  • 30
  • 40
0

Two minor modifications to @vestland's answer (see the code block below) fixed the following for me:

  • Data points lie on the edges and not clearly inside the shaded vertical bars. It's fixed by using hvh for line_shape. See docs here.
  • The shaded vertical bar for the last weekend is missing. It's fixed by using tozeroy for fill. See docs here.
fig.add_trace(go.Scatter(...,
                         fill = 'tozeroy', # Changed from 'tonexty'.                         
                         line_shape = 'hvh', # Changed from 'hv'.
                         ...,
                        ),