I have panel data where I split the whole data set into multiple data frames by year and match unique keys across years. For example, if you have 6,000 observations in 2000 and 7000 observations in, I'm trying to match the overlap between each year for every year from 2000 to 2017.
I have a brute forced solution that's about 350 lines of copy and pasted code, but I'm looking for a more efficient and elegant solution using loops.
I'm working with for loops and looking into map() functions at the moment, but I haven't found a solution. I'm using R4DS.
#1989
b1989 <- b %>% filter(year == 1989) %>% select(key, V7, z9, z11, z13, z15)
a1990 <- a %>% select(key,year) %>% filter(year == 1990) %>% distinct()
br1989 <- inner_join(b1989, a1990, by = "key")
#1990
b1990 <- b %>% filter(year == 1990) %>% select(key, V7, z9, z11, z13, z15)
a1991 <- a %>% select(key,year) %>% filter(year == 1991) %>% distinct()
br1990 <- inner_join(b1990, a1991, by = "key")
#1991
b1991 <- b %>% filter(year == 1991) %>% select(key, V7, z9, z11, z13, z15)
a1992 <- a %>% select(key,year) %>% filter(year == 1992) %>% distinct()
br1991 <- inner_join(b1991, a1992, by = "key")
busrescount_t1 <- c(nrow(br1989),nrow(br1990),nrow(br1991))
busrescount_t1
[1] 4366 4956 4768
It currently works, but is simply bad code and cumbersome. Also, doing it at scale for 2-year, 3-year, 4-year differences in a nightmare and will be 1000+ lines of copy/pasted code.
The goal is to have a loop that produces a vector of these matches that can be placed into a data frame. I'm trying to do this for 20+ years.