I am working with the R programming language.
Suppose I have the following dataset:
set.seed(123)
n <- 100
library(dplyr)
years <- 2010:2020
id <- rep(1:n, each = length(years))
year <- rep(years, n)
my_data <- data.frame(id, year)
# randomly remove some years for each id
my_data <- my_data[sample(nrow(my_data), 0.7*nrow(my_data)), ]
# add var1 and var2 as random 0/1 variables
my_data$var1 <- sample(c(0, 1), nrow(my_data), replace = TRUE)
my_data$var2 <- sample(c(0, 1), nrow(my_data), replace = TRUE)
my_data = my_data %>% arrange(id, year)
I am using the following code (answer provided here: Counting Number of Unique Column Values Per Group) to find out the number of times each combination of years appear:
agg <- aggregate(year ~ id, my_data, paste, collapse = ", ")
final = as.data.frame(table(agg$year))
Var1 Freq
1 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 1
2 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2019 2
3 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2019, 2020 1
4 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2020 1
5 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2019 1
6 2010, 2011, 2012, 2013, 2014, 2016, 2017, 2018, 2019, 2020 1
My Question: Now, I want to add information about var1 and var2 to this table - that is, for example: how many times does 2010,2011,2012 appear when (var1 = 1 & var2 =1), when (var1 = 0, & var2 = 0), etc.
Here is the approach I am using:
agg <- aggregate(year ~ id + var1 + var2, my_data, paste, collapse = ", ")
final = as.data.frame(table(agg$year, agg$var1, agg$var2))
colnames(final) = c("year", "var1", "var2", "Freq")
Can someone please tell me if this is correct?
Thanks!