I have a csv
file with dates, repair_id, number of onsite repairs and number of offsite repairs, so that my data looks as:
data repair_id num_onsite num_offsite
2016-02-01 A 3 0
2016-02-01 B 2 1
2016-02-01 D 0 4
2016-02-02 A 1 3
2016-02-02 C 1 1
2016-02-02 E 0 6
...
2016-02-14 A 1 3
2016-02-14 B 0 4
2016-02-14 D 2 0
2016-02-14 E 3 0
There are 5 different repair_id
, namely: A, B, C, D, E
. If a repair man (repair_id
) had no work on a given date then they are not in the csv file for that date. I would like to change that by including them and have a 0
value
for num_onsite
and num_offsite
so that my table would resemble:
data repair_id num_onsite num_offsite
2016-02-01 A 3 0
2016-02-01 B 2 1
2016-02-01 C 0 0 # added
2016-02-01 D 0 4
2016-02-01 E 0 0 # added
2016-02-02 A 1 3
2016-02-02 B 0 0 # added
2016-02-02 C 1 1
2016-02-02 D 0 0 # added
2016-02-02 E 0 6
...
2016-02-14 A 1 3
2016-02-14 B 0 4
2016-02-14 C 0 0 # added
2016-02-14 D 2 0
2016-02-14 E 3 0
I've had a look at:
Pandas DataFrame insert / fill missing rows from previous dates
Missing data, insert rows in Pandas and fill with NAN
Add missing dates to pandas dataframe
but I wasn't able to get it to output properly