My raw data looks like this:
raw_data = [
{
"date": "02.08.2020",
"status": "A",
"id": 3948,
},
{
"date": "02.08.2020",
"status": "B",
"id": 495,
},
{
"date": "03.08.2020",
"status": "A",
"id": 433,
},
{
"date": "03.08.2020",
"status": "B",
"id": 845,
},
{
"date": "03.08.2020",
"status": "B",
"id": 54,
},
{
"date": "03.08.2020",
"status": "C",
"id": 133,
},
{
"date": "04.08.2020",
"status": "B",
"id": 384,
},
{
"date": "04.08.2020",
"status": "C",
"id": 1234,
},
]
Then I create a pandas dataframe from it and group by date and status to get the count per date, per status:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(raw_data)
grp = df.groupby(['date', 'status']).size().to_frame(name = 'count').reset_index()
print(grp)
The output:
0 02.08.2020 A 1
1 02.08.2020 B 1
2 03.08.2020 A 1
3 03.08.2020 B 2
4 03.08.2020 C 1
5 04.08.2020 B 1
6 04.08.2020 C 1
I then chart this using seaborn:
sns.barplot(x="date", y="count", hue="status", data=grp)
plt.show()
And the chart looks like this:
Now that's cool, it visualises the status per day. But what I would like to do is fill the "gaps" between the days, because I might need to actually visualise the data starting from 01.08.2020 to 05.08.2020. I would like my dataframe to look like this:
0 01.08.2020 A 0 <<< added 3 empty rows for 01.08.2020 bc. no data
1 01.08.2020 B 0 <<<
2 01.08.2020 C 0 <<<
3 02.08.2020 A 1
4 02.08.2020 B 1
5 03.08.2020 A 1
6 03.08.2020 B 2
7 03.08.2020 C 1
8 04.08.2020 B 1
9 04.08.2020 C 1
10 05.08.2020 A 0 <<< added 3 empty rows for 05.08.2020 bc. no data
11 05.08.2020 B 0 <<<
12 05.08.2020 C 0 <<<
The idea is to have all days from 01.08 to 05.08 in the charts x-axis, even if there is no data for a given day. The goal is to make the x-axis continuous with all days from 01.08.2020 to 05.08.2020, data or no data.
I had a look at another SO question which is about adding missing days in a datetime index. It's kind of what I want, but I also need to have the status per day empty I guess.
Can I create a dataframe like I want with builtin pandas functionality?