Similar to How to select the row with the maximum value in each group, but with an added wrinkle that is throwing me off:
I have a data set with countries, years, and a numeric value in a third column, similar to the following:
X <- c("USA","USA","USA","USA","MEXICO","MEXICO","MEXICO","MEXICO","CANADA","CANADA","CANADA","CANADA")
Y <- c(2000,2001,2002,2003,2000,2001,2002,2003,2000,2001,2002,2003)
Z <- c(20,5,3,40,8,6,12,18,3,2,5,10)
XYZ <- cbind(X,Y,Z)
XYZ
X Y Z
[1,] "USA" "2000" "20"
[2,] "USA" "2001" "5"
[3,] "USA" "2002" "3"
[4,] "USA" "2003" "40"
[5,] "MEXICO" "2000" "8"
[6,] "MEXICO" "2001" "6"
[7,] "MEXICO" "2002" "12"
[8,] "MEXICO" "2003" "18"
[9,] "CANADA" "2000" "3"
[10,] "CANADA" "2001" "2"
[11,] "CANADA" "2002" "5"
[12,] "CANADA" "2003" "10"
I need to create a variable that will extract the maximum value for a country group in Column X among observations prior to the given observation.
Given the example data, the resulting variable would appear as column AA below.
X Y Z AA
1 USA 2000 20 20
2 USA 2001 5 20
3 USA 2002 3 20
4 USA 2003 40 40
5 MEXICO 2000 8 8
6 MEXICO 2001 6 8
7 MEXICO 2002 12 12
8 MEXICO 2003 18 18
9 CANADA 2000 3 3
10 CANADA 2001 2 3
11 CANADA 2002 5 5
12 CANADA 2003 10 10
As you can see from the example, for the "USA" group, 20 is delivered as the maximum value for observations (1), (2), and (3), and 40 is delivered as the maximum for observation (4).
How can I create such a variable using an R or dplyr function?
Thank you in advance to any and all who can help!