- There is one solution to pandas - stacked bar chart with timeseries data
- The issue with that question, is that OP is not aggregating any data, so that solution doesn't work for this question.
- Use
pandas.DataFrame.groupby
on 'date'
and 'group'
, while aggregating .sum
on 'time'
- The
.dt
extractor is used to extract only the .date
component of the 'date'
column.
- Make certain the
'Date'
column of your dataframe is properly formatted as a datetime
dtype
, with df.Date = pd.to_datetime(df.Date)
- The grouped dataframe,
dfg
, must be shaped into the correct form, which can be accomplished with pandas.DataFrame.pivot
.
- The easiest way to stack a bar plot is with
pandas.DataFrame.plot.bar
and use the stacked
parameter.
- See
pandas.DataFrame.plot
for all the parameters.
Imports and Data Transformation
import pandas as pd
import matplotlib.pyplot as plt
import random # for test data
import numpy as np # for test data
# setup dataframe with test data
np.random.seed(365)
random.seed(365)
rows = 1100
data = {'hours': np.random.randint(10, size=(rows)),
'group': [random.choice(['A', 'B', 'C']) for _ in range(rows)],
'date': pd.bdate_range('2020-11-24', freq='h', periods=rows).tolist()}
df = pd.DataFrame(data)
# display(df.head())
hours group date
0 2 C 2020-11-24 00:00:00
1 4 B 2020-11-24 01:00:00
2 1 C 2020-11-24 02:00:00
3 5 A 2020-11-24 03:00:00
4 2 B 2020-11-24 04:00:00
# use groupby on df
dfg = df.groupby([df.date.dt.date, 'group'])['hours'].sum().reset_index()
# pivot the dataframe into the correct format
dfp = dfg.pivot(index='date', columns='group', values='hours')
# display(dfp.head())
group A B C
date
2020-11-24 49 25 29
2020-11-25 62 18 57
2020-11-26 42 77 4
2020-11-27 34 43 17
2020-11-28 28 53 23
- More succinctly, the groupby and pivot step can be replaced with
.pivot_table
, which both reshapes and aggregates
index=df.date.dt.date
is used so the index doesn't include the time component, since the data for the entire day is being aggregated.
dfp = df.pivot_table(index=df.date.dt.date, columns='group', values='hours', aggfunc='sum')
Plot
# plot the pivoted dataframe
dfp.plot.bar(stacked=True, figsize=(10, 6), ylabel='Hours', xlabel='Date', title='Sum of Daily Category Hours')
plt.legend(title='Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

- There will be a bar for each day, this is how bar plot ticks work, so the plot could be very wide, if there are many dates.
- Consider using
pandas.DataFrame.barh
dfp.plot.barh(stacked=True, figsize=(6, 10), title='Sum of Daily Category Hours')
plt.legend(title='Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xlabel('Hours')
plt.ylabel('Date')
plt.show()

- The OP states there is data from 2018 to 2020, which means there could be over 700 days worth of data, which translates to over 700 bars in the bar plot.
- A standard line plot might be the best option to properly visualize the data.
dfp.plot(figsize=(10, 6))
plt.show()
