I've actually already asked this question for SQL and got a great answer here: SQL - LAG to get previous value if condition using multiple previous columns satisfied
But now I need it for Pandas. Say we have a dataframe:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8],
'EventName':['Team A vs Team B',
'Team A vs Team B',
'Team C vs Team D',
'Team Z vs Team A',
'Team A vs Team B',
'Team C vs Team D',
'Team C vs Team D',
'Team E vs Team F',],
'HomeTeam': ['Team A', 'Team A', 'Team C', 'Team Z',
'Team A', 'Team C', 'Team C', 'Team E'],
'Metric':[5,7,6,8,9,3,1,2]})
Which results in:
id EventName HomeTeam Metric
------------------------------------------
1 Team A vs Team B Team A 5
2 Team A vs Team B Team A 7
3 Team C vs Team D Team C 6
4 Team Z vs Team A Team Z 8
5 Team A vs Team B Team A 9
6 Team C vs Team D Team C 3
7 Team C vs Team D Team C 1
8 Team E vs Team F Team E 2
I want to calculate a new column PreviousMetricN where N can be 1, 2, 3, ... which shows the previous value for Metric, but only if the HomeTeam was involved in the previous event. For example:
id EventName HomeTeam Metric PreviousMetric1 PreviousMetric2
------------------------------------------------------------------------
1 Team A vs Team B Team A 5 NULL NULL
2 Team A vs Team B Team A 7 5 NULL
3 Team C vs Team D Team C 6 NULL NULL
4 Team Z vs Team A Team Z 8 NULL NULL
5 Team A vs Team B Team A 9 8 7
6 Team C vs Team D Team C 3 6 NULL
7 Team C vs Team D Team C 1 3 6
8 Team E vs Team F Team E 2 NULL NULL
I guess it would be pretty easy with a for loop. But I need a solution that is vectorized or uses some combination of shift
/groupby
/np.where
. Not even sure where to start with this?