1

I'm new to plotly, and to pandas, and I am attempting to find an elegant solution since I believe I was either not using groupby efficiently in plotly or maybe my data is somehow stacked which is preventing me from visualizing it.

To make a test chart I used a fake data set by zipping together 3 lists (group, month, spend), and split it into "actual" and "forecast" values after a specific month (Mar' 20).

When I attempted to add a trace of the forecast df which contains 3 different groups across several months I got the monster below.

An attempt at using trace for all three groups

When I changed the index to the group and then used loc to subset into 3 separate sets (one per group), I succeeded in making the following chart, albeit it feels like a frankenstein solution:

Using three separate traces

I'm wondering if there is either a way to have plotly chart the initial dataframe and change line style after a certain point in the x axis, and if not, if there is a way to use trace on a subset of the data which contains three different groups (group1, group2, group3)? I am not sure if using three separate traces and splitting the data over and over is the best solution, and I am confident that there is a more efficient one.

Here is how I currently get the separate groups:

# reset index 
forecast = forecast.set_index(['group'])

#split
group1_forecast =forecast.loc['group1']
group2_forecast = forecast.loc['group2']
group3_forecast = forecast.loc['group3']

Here is the (minimum) code for the chart with the separate traces:

fig = None

fig = px.line(actual, 
            x="month", y="spend", color='group',
            title=title)

# group1 
fig.add_scatter(
    x= group1_forecast.month,
    y = group1_forecast.spend,
    mode = 'lines',
    line = dict(shape = 'linear', color = 'purple', width = 1, dash = 'dot'),
    connectgaps = True
)

# group2 trace 
fig.add_scatter(
    x= group2_forecast.month,
    y = group2_forecast.spend,
    mode = 'lines',
    line = dict(shape = 'linear', color = '#33C1FF', width = 1, dash = 'dot'),
    connectgaps = True
)

# group3 trace
fig.add_scatter(
    x= group3_forecast.month,
    y = group3_forecast.spend,
    mode = 'lines',
    line = dict(shape = 'linear', color = '#FFDD33', width = 1, dash = 'dot'),
    connectgaps = True
)

fig.show()

Here's the data:

months = ["Mar '19", "Mar '19", "Mar '19", 
          "Apr '19", "Apr '19", "Apr '19", 
          "May '19", "May '19", "May '19", 
          "Jun '19", "Jun '19", "Jun '19", 
          "Jul '19", "Jul '19", "Jul '19", 
          "Aug '19", "Aug '19", "Aug '19", 
          "Sep '19", "Sep '19", "Sep '19", 
          "Oct '19", "Oct '19", "Oct '19", 
          "Nov '19", "Nov '19", "Nov '19", 
          "Dec '19", "Dec '19", "Dec '19", 
          "Jan '20", "Jan '20", "Jan '20", 
          "Feb '20", "Feb '20", "Feb '20", 
          "Mar '20", "Mar '20", "Mar '20", 
          "Apr '20", "Apr '20", "Apr '20", 
          "May '20", "May '20", "May '20", 
          "Jun '20", "Jun '20", "Jun '20", 
          "Jul '20", "Jul '20", "Jul '20", 
          "Aug '20", "Aug '20", "Aug '20", 
          "Sep '20", "Sep '20", "Sep '20"]

groups = ['group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3']

spend = [57, 150, 75, 
        61.5, 156, 78, 
        66, 150, 75, 
        63, 162, 81, 
        69, 163.5, 81.75,
        76.5, 162, 81, 
        78, 168, 84,
        79.5, 168, 84, 
        84, 162, 81, 
        87, 169.5, 84.75, 
        93, 171, 85.5, 
        96, 169.5, 84.75, 
        97.5, 168, 84,
        97.9, 167.7, 84.5,
        98.4, 167.9, 85.1,
        99.9, 168.1, 85.7,
        100.9, 168, 86.1,
        101.6, 168.4, 86.3,
        102.7, 168.8, 86.9]

spend_by_group_list = list(zip(months, groups, spend))

spend_df = pd.DataFrame(spend_by_group_list, columns = ['month', 'group', 'spend'])
Lani
  • 107
  • 1
  • 1
  • 11
  • May you provide sample data in a [reproducible way](https://stackoverflow.com/questions/20109391)? – Bill Huang Oct 24 '20 at 10:17
  • You're right I removed it when I was editing the post! Thank you for making me aware of it @BillHuang – Lani Oct 24 '20 at 10:26
  • Should `spend_df` be `forecast`? Also what is `go` and `px`? Please confirm reproducibiltiy of the code you posted, even if some aliases are commonly-used. – Bill Huang Oct 24 '20 at 10:37
  • Spend_df is the whole df, where forecast values start after Mar '20, and actual go to Mar '20. Px here is plotly express, and the title is any title :) – Lani Oct 24 '20 at 10:55

1 Answers1

1

I reimplemented your data processing steps after spend_df is created. I am not 100% sure what is the root cause of your problem because you didn't provide the exact code to reproduce that. However, there should be no problem if you split your groups like this: spend_df[spend_df["group"] == "groupN"]. The order of the months should be preserved.

# use spend_df created by your code

# split the different groups
split_month = 13
ls_actual = []  # by group
ls_forecast = []  # by group
for i in range(3):
    df = spend_df[spend_df["group"] == f"group{i+1}"]
    ls_actual.append(df[:split_month])
    ls_forecast.append(df[split_month:])

actual = pd.concat(ls_actual, axis=0)  # stack vertically

# plot
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "browser"

# actual
ls_colors = ['purple', '#33C1FF', '#FFDD33']
fig = px.line(
    actual, x="month", y="spend", color='group',
    color_discrete_map={f"group{i+1}": ls_colors[i] for i in range(3)},
    title="title"
)

# forecast
for i in range(3):
    fig.add_scatter(
        x=ls_forecast[i].month,
        y=ls_forecast[i].spend,
        mode='lines',
        line=dict(shape='linear', color=ls_colors[i], width=1, dash='dot'),
        connectgaps=True
    )

fig.show()

Result:

Imgur

Bill Huang
  • 4,491
  • 2
  • 13
  • 31