Pandas: Create new column that finds & selects value from another row based on conditions

Question

I need to create a new column in Pandas that finds the value for a specific day from a specific week for a specific ID from prior data. Data looks like this:

ID / Day / Week / Value
1 / Mon / 5 / 10
1 / Tue / 5 / 12
1 / Wed / 5 / 17

2 / Mon / 5 / 12
2 / Tue / 5 / 14
2 / Wed / 5 / 12

3 / Mon / 5 / 16
3 / Tue / 5 / 19
3 / Wed / 5 / 22

1 / Mon / 6 / 12
1 / Tue / 6 / 17
1 / Wed / 6 / 16

2 / Mon / 6 / 15
2 / Tue / 6 / 15
2 / Wed / 6 / 20

3 / Mon / 6 / 10
3 / Tue / 6 / 14
3 / Wed / 6 / 17

1 / Mon / 7 / 12
1 / Tue / 7 / 19
1 / Wed / 7 / 22

2 / Mon / 7 / 13
2 / Tue / 7 / 14
2 / Wed / 7 / 25

3 / Mon / 7 / 11
3 / Tue / 7 / 16
3 / Wed / 7 / 20

Now let's say I want to create a new column called "Wk5Val" that always find and selects the value from Week 5 for the same day of the week for the same ID. For instance, for ID = 2, Day = Tue, Week=7, the value should be 14, since that's the Week 5 value for ID=2 & Day=Tue.

At this point, I've tried about a dozen different things and can't figure this one out. In my head, I'm thinking of it working equivalent to something like this:

df[(df.Day == df.Day) & (df.week == 5) & (df.ID == df.ID)]['Value'].values[0]

Except that will only work with specific values. I need this to work on every row in a Dataframe and adjust for the "Day" and "ID".

Is there any straight-forward way to do this?

Could you add the code for creating your data such as instructed in [this guide](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)? — xicocaio, Mar 31 '21 at 13:06
Am busy at the moment so will try to post proper answer later, sorry, but this should be solved by either 1) sorting your data and using pd.Series.shift(-1) OR 2) merging a copy of the dataframe onto the original and cross-referencing — Ricardo Udenze, Mar 31 '21 at 13:16

It_is_Chris · Answer 1 · 2021-03-31T14:53:43.250

IIUC

You can use boolean indexing with merge

df.merge(df.loc[df['Week'] == 5, ['ID', 'Day', 'Value']],
         on=['ID', 'Day'], suffixes=['', '_Week5'])

    ID  Day  Week  Value  Value_Week5
0    1  Mon     5     10           10
1    1  Mon     6     12           10
2    1  Mon     7     12           10
3    1  Tue     5     12           12
4    1  Tue     6     17           12
5    1  Tue     7     19           12
6    1  Wed     5     17           17
7    1  Wed     6     16           17
8    1  Wed     7     22           17
9    2  Mon     5     12           12
10   2  Mon     6     15           12
11   2  Mon     7     13           12
12   2  Tue     5     14           14
13   2  Tue     6     15           14
14   2  Tue     7     14           14
15   2  Wed     5     12           12
16   2  Wed     6     20           12
17   2  Wed     7     25           12
18   3  Mon     5     16           16
19   3  Mon     6     10           16
20   3  Mon     7     11           16
21   3  Tue     5     19           19
22   3  Tue     6     14           19
23   3  Tue     7     16           19
24   3  Wed     5     22           22
25   3  Wed     6     17           22
26   3  Wed     7     20           22

This is a good answer but I think a misleading description. You've used the ````.loc```` function to pass a new dataframe object to the merge function call. Therefore the boolean indexing and the merge function are unrelated — Ricardo Udenze, Mar 31 '21 at 13:38
@RicardoUdenze I am essentially passing a boolean Series to `.loc` to filter the original dataframe and then merge the original dataframe with the filtered one. How is the description misleading? — It_is_Chris, Mar 31 '21 at 13:53
It implied, to me anyway, that the merge function can directly utilise boolean indexing. I just think it could be explained better / the code could be more readable, but I'm just thinking about my work where I explain things to new coders a lot — Ricardo Udenze, Mar 31 '21 at 14:13

Pandas: Create new column that finds & selects value from another row based on conditions

1 Answers1