How do I create counts of how many observations were active between start and end years grouped by an id variable?
I have a df whose unit of analysis is organizations. These organizations are active for only certain periods of time (startyear, endyear) and in certain countries (acr). Some countries have more than one organization active at the same time. I would like to create a variable that reflects the number of contemporaneously active organizations for each country (for each observation the answer will be at least 1 and could be several dozen). For each observation's startyear, I believe I would need to go over my data frame by country and count how many times the startyear falls between other observations startyear/endyear. How to do it?
I have attempted to do this with dplyr (per Count how many fall in each group in R):
nsa2 <- nsa %>%
group_by(acr,startyear) %>%
mutate(count=n())
Is there a way to use filter() or between(), even though the between years are not fixed but change? or even apply to go back over the df?
Here is some sample data with a few obs that overlap:
structure(list(acr = structure(c(8L, 10L, 11L, 7L, 6L, 4L, 7L,
6L, 3L, 3L, 12L, 2L, 13L, 9L, 1L, 8L, 15L, 7L, 5L, 14L), .Label = c("AFG",
"ARG", "CHA", "DRC", "IRQ", "ISR", "MYA", "RUS", "RWA", "SOM",
"SUD", "THI", "UKG", "YUG", "ZIM"), class = "factor"), startyear = c(1994,
2009, 1971, 1996, 1965, 2008, 2000, 1990, 1992, 1992, 2003, 1974,
1957, 1997, 1996, 1946, 1976, 1996, 2004, 1998), endyear = c(1996,
2010, 1971, 1996, 1973, 2008, 2000, 1999, 1994, 1994, 2011, 1977,
1957, 2002, 1996, 1948, 1979, 1996, 2005, 1999), obsid = structure(c(5L,
3L, 19L, 6L, 11L, 10L, 20L, 9L, 17L, 18L, 8L, 12L, 14L, 4L, 2L,
16L, 1L, 13L, 15L, 7L), .Label = c("NSA.3.4-1030", "NSA.3.4-1126",
"NSA.3.4-1219", "NSA.3.4-1381", "NSA.3.4-1546", "NSA.3.4-157",
"NSA.3.4-1594", "NSA.3.4-1657", "NSA.3.4-1666", "NSA.3.4-1678",
"NSA.3.4-271", "NSA.3.4-346", "NSA.3.4-376", "NSA.3.4-391", "NSA.3.4-406",
"NSA.3.4-58", "NSA.3.4-679", "NSA.3.4-685", "NSA.3.4-892", "NSA.3.4-91"
), class = "factor")), row.names = c(NA, -20L), groups = structure(list(
acr = structure(c(1L, 4L, 15L, 26L, 43L, 44L, 44L, 59L, 59L,
75L, 75L, 77L, 83L, 87L, 91L, 97L, 104L, 105L), .Label = c("AFG",
"ALG", "ANG", "ARG", "AZE", "BFO", "BNG", "BOL", "BOS", "BUI",
"CAM", "CAO", "CDI", "CEN", "CHA", "CHL", "CHN", "COL", "COM",
"CON", "COS", "CRO", "CUB", "DJI", "DOM", "DRC", "EGY", "ERI",
"ETH", "FRN", "GAB", "GAM", "GHA", "GNB", "GRC", "GRG", "GUA",
"GUI", "HAI", "IND", "INS", "IRN", "IRQ", "ISR", "KEN", "LAO",
"LBR", "LEB", "LES", "LIB", "MAA", "MAC", "MAG", "MAL", "MEX",
"MLD", "MLI", "MOR", "MYA", "MZM", "NEP", "NIC", "NIG", "NIR",
"NTH", "OMA", "PAK", "PAN", "PAR", "PER", "PHI", "PNG", "POR",
"RUM", "RUS", "RVN", "RWA", "SAF", "SAL", "SAU", "SEN", "SIE",
"SOM", "SPN", "SRI", "SSD", "SUD", "SUR", "SYR", "TAJ", "THI",
"TOG", "TRI", "TUN", "TUR", "UGA", "UKG", "URU", "USA", "UZB",
"VEN", "YEM", "YPR", "YUG", "ZIM"), class = "factor"), startyear = c(1996,
1974, 1992, 2008, 2004, 1965, 1990, 1996, 2000, 1946, 1994,
1997, 2009, 1971, 2003, 1957, 1998, 1976), .rows = list(15L,
12L, 9:10, 6L, 19L, 5L, 8L, c(4L, 18L), 7L, 16L, 1L,
14L, 2L, 3L, 11L, 13L, 20L, 17L)), row.names = c(NA,
-18L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))