I have a dataframe consisting of columns Name
(names), value
(the week in 2016 that an event occurred), binary
(an indication that the event occurred, "1"), for example:
df
Name value binary
apple 2016 W16 1
orange 2016 W17 1
melon 2016 W20 1
berry 2016 W17 1
lime 2016 W19 1
I am interested in adding rows to this dataframe so that each Name
(apple, orange, etc.) has an item in the value
column for the weeks before an event occurred. Again, the week that the event occurred is stated as the value
column in df
. The time period of interest is the weeks between 2016 W16 and 2016 W19, e.g.:
start_end_weeks
week
2016 W16
2016 W17
2016 W18
2016 W19
My problem is that I need to in-fill the rows with weeks between 2016 W16 and 2016 W19 that are not represented in df
. Here's what I mean:
df_result
Name value binary
apple 2016 W16 1
orange 2016 W16 0
orange 2016 W17 1
melon 2016 W16 0
melon 2016 W17 0
melon 2016 W18 0
melon 2016 W19 0
melon 2016 W20 1
berry 2016 W17 1
lime 2016 W19 1
...
But since value
isn't a traditional date time object, I'm not sure how to get python to recognize that 2016 W16 occurs before 2016 W17, and then to only in-fill the values before the week stated in value
in df
.
I am having trouble with where to start, so if someone could help me convert value
to a date time object that would be great, and I can go from there. Any other insight appreciated.
I found this stack overflow question which is all I have so far: Match rows in one Pandas dataframe to another based on three columns.