0

I'm trying to make a Plotly Stacked Horizontal Bar Graph with a Slider, but I cant make it work. I'm using plotly.graph_objects.

I have a Database of Covid-19 infections from march-20 to august-21, something like this (columns are: Date, Age, Sex, Cases, Month-Year):

       Fecha     Edad       Sexo  Casos Mes-Año
0 2020-03-31   0 - 10   FEMENINO      8  Mar-20
1 2020-03-31   0 - 10  MASCULINO     10  Mar-20
2 2020-03-31  10 - 20   FEMENINO     25  Mar-20
3 2020-03-31  10 - 20  MASCULINO     21  Mar-20
4 2020-03-31  20 - 30   FEMENINO    113  Mar-20
5 2020-03-31  20 - 30  MASCULINO    120  Mar-20
6 2020-03-31  30 - 40   FEMENINO    104  Mar-20
7 2020-03-31  30 - 40  MASCULINO    165  Mar-20
8 2020-03-31  40 - 50   FEMENINO    101  Mar-20
9 2020-03-31  40 - 50  MASCULINO    160  Mar-20

Here is a sample of my Data:

pd.DataFrame({'index': [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23],
 'columns': ['Fecha', 'Edad', 'Sexo', 'Casos', 'Mes-Año'],
 'data': [[Timestamp('2020-03-31 00:00:00'),'20 - 30','FEMENINO',4,'Mar-20'],
  [Timestamp('2020-03-31 00:00:00'), '20 - 30', 'MASCULINO', 5, 'Mar-20'],
  [Timestamp('2020-03-31 00:00:00'), '30 - 40', 'FEMENINO', 2, 'Mar-20'],
  [Timestamp('2020-03-31 00:00:00'), '30 - 40', 'MASCULINO', 8, 'Mar-20'],
  [Timestamp('2020-04-30 00:00:00'), '20 - 30', 'FEMENINO', 26, 'Apr-20'],
  [Timestamp('2020-04-30 00:00:00'), '20 - 30', 'MASCULINO', 59, 'Apr-20'],
  [Timestamp('2020-04-30 00:00:00'), '30 - 40', 'FEMENINO', 57, 'Apr-20'],
  [Timestamp('2020-04-30 00:00:00'), '30 - 40', 'MASCULINO', 129, 'Apr-20'],
  [Timestamp('2020-05-31 00:00:00'), '20 - 30', 'FEMENINO', 61, 'May-20'],
  [Timestamp('2020-05-31 00:00:00'), '20 - 30', 'MASCULINO', 92, 'May-20'],
  [Timestamp('2020-05-31 00:00:00'), '30 - 40', 'FEMENINO', 131, 'May-20'],
  [Timestamp('2020-05-31 00:00:00'), '30 - 40', 'MASCULINO', 373, 'May-20'],
  [Timestamp('2020-06-30 00:00:00'), '20 - 30', 'FEMENINO', 93, 'Jun-20'],
  [Timestamp('2020-06-30 00:00:00'), '20 - 30', 'MASCULINO', 121, 'Jun-20'],
  [Timestamp('2020-06-30 00:00:00'), '30 - 40', 'FEMENINO', 190, 'Jun-20'],
  [Timestamp('2020-06-30 00:00:00'), '30 - 40', 'MASCULINO', 426, 'Jun-20'],
  [Timestamp('2020-07-31 00:00:00'), '20 - 30', 'FEMENINO', 91, 'Jul-20'],
  [Timestamp('2020-07-31 00:00:00'), '20 - 30', 'MASCULINO', 117, 'Jul-20'],
  [Timestamp('2020-07-31 00:00:00'), '30 - 40', 'FEMENINO', 192, 'Jul-20'],
  [Timestamp('2020-07-31 00:00:00'), '30 - 40', 'MASCULINO', 382, 'Jul-20'],
  [Timestamp('2020-08-31 00:00:00'), '20 - 30', 'FEMENINO', 85, 'Aug-20'],
  [Timestamp('2020-08-31 00:00:00'), '20 - 30', 'MASCULINO', 148, 'Aug-20'],
  [Timestamp('2020-08-31 00:00:00'), '30 - 40', 'FEMENINO', 197, 'Aug-20'],
  [Timestamp('2020-08-31 00:00:00'), '30 - 40', 'MASCULINO', 338, 'Aug-20']]})

I want to get an horizontal stacked bar chart where the values in the Y axis are Age ranges (0-10, 0-20, 0-30, ..., 90-inf), the values on the X axis are number of people infected, and the stacked columns are for Male and Female. Without the slider it should look like this:

What I want for each step of the slider

And the slider would be for each month since March, all the way to August.

This is what I've tried so far:

for value in pd.DatetimeIndex(df['Fecha']).sort_values().unique():
    df_FEM = df.loc[(pd.DatetimeIndex(df['Fecha']) == value) & (df['Sexo'] == 'FEMENINO')]
    df_MAS = df.loc[(pd.DatetimeIndex(df['Fecha']) == value) & (df['Sexo'] == 'MASCULINO')]
    
    fig = go.Figure(
        data=[
            go.Bar(
                x = df_FEM['Casos'], 
                y = df_FEM['Edad'],
                orientation = 'h',
                text = df_FEM['Casos'], 
                texttemplate = '%{text:,9r}',
                textfont = {'size':18}, 
                textposition ='inside', 
                insidetextanchor ='middle'
            ),
            go.Bar(
                x = df_MAS['Casos'], 
                y = df_MAS['Edad'],
                orientation = 'h',
                text = df_MAS['Casos'], 
                texttemplate = '%{text:,9r}',
                textfont = {'size':18}, 
                textposition ='inside', 
                insidetextanchor ='middle'
            )
        ],
        layout=go.Layout(
            xaxis = dict(title=dict(text='Casos Covid-19 Por Edad y Sexo: ',font=dict(size=18))),
            yaxis=dict(tickfont=dict(size=14)),
            barmode='stack'
        )
    )
    
# Create and add slider
steps = []
for i in range(len(fig.data)):
    print(len(fig.data))
    step = dict(
        method="update",
        args=[{"visible": [False] * len(fig.data)},
              {"title": "Slider switched to step: " + str(i)}],  # layout attribute,
    )
    step["args"][0]["visible"][i] = True  # Toggle i'th trace to "visible"
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={"prefix": "Frequency: "},
    pad={"t": 50},
    steps=steps
)]

fig.update_layout(
    sliders=sliders
)

I have two Problems:

First, The Slider I get has only two steps. Now, I know the problem is because, in each iteration, a different Figure is created, so at the end the Length of this Figure will only be 2. The problem is that I don't know how to solve this.

Second, even with this two step slider, when I move it, my stacked bar graph changes to a regular bar chart, without stacking anything.

That's pretty much it. I would appreciate any help I can get. Thanks

David YL
  • 13
  • 5
  • Please share a sample of your data as described [here](https://stackoverflow.com/questions/63163251/pandas-how-to-easily-share-a-sample-dataframe-using-df-to-dict/63163254#63163254) – vestland Sep 08 '21 at 06:38
  • where have you sourced your data? I can simulate some – Rob Raymond Sep 08 '21 at 06:47
  • @vestland I edited my question and put a sample data, much smaller than my original, but I think it would be enough to simulate it. – David YL Sep 10 '21 at 14:02

1 Answers1

1
  • your data source has not been shared. Have used OWID daily case data and supplemented with UK demographics data to generate a sample data set
  • using plotly express significantly simplifies creating traces / frames. This is a natural use case of animations
  • dataframe preparation is step that makes this simple. All data is in one dataframe, with columns for month end (animation column), age (y-axis) and cases (x-axis)
  • month end needed to be a string not a date to be valid for animation_frame parameter
  • further formatting can be done, however this does what you are looking for.
import pandas as pd
import io, requests
import plotly.express as px

# get OWID data, just take UK
dfall = pd.read_csv(io.StringIO(
    requests.get("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv").text))
dfall["date"] = pd.to_datetime(dfall["date"])

dfme = dfall.loc[
    dfall["iso_code"].eq("GBR"), ["iso_code", "location", "date", "new_cases"]
].assign(monthend=lambda d: d["date"] + pd.offsets.MonthEnd(0))

# UK demographics
dfukdemo = pd.DataFrame({"Sex": ["Female", "Female", "Female", "Female", "Female", "Male", "Male", "Male", "Male", "Male"], 
                         "age": ["0-19", "20-39", "40-59", "60-79", "80+", "0-19", "20-39", "40-59", "60-79", "80+"], 
                         "ratio": [0.1129, 0.1306, 0.1324, 0.1, 0.03, 0.1182, 0.1324, 0.1295, 0.0932, 0.0208]})

# rollup daily to monthly, early dates ignore
dfme = dfme.loc[dfme["date"].ge("1-mar-2020")].groupby([c for c in dfme.columns if not c in ["date", "new_cases"]], as_index=False).sum()

# final data frame for plotting, breaks down by sex and age demographics
dfme = (
    dfme.assign(foo=1, me=lambda d: d["monthend"].dt.strftime("%b %y"))
    .merge(dfukdemo.assign(foo=1), on="foo")
    .assign(cases=lambda d: d["new_cases"] * d["ratio"])
)

# generate the plot
fig = px.bar(
    dfme,
    x="cases",
    y="age",
    text="cases",
    color="Sex",
    orientation="h",
    animation_frame="me",
)

# a bit of formatting...
fig = fig.update_traces(texttemplate="%{text:.3s}")

for f in fig.frames:
    for t in f["data"]:
        t["texttemplate"] = "%{text:.3s}"

fig.update_layout(xaxis={"range":[0,dfme["cases"].max()*2]})


enter image description here

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30