0

I am working with the R programming language.

Suppose I have the following dataset:

set.seed(123) 
n <- 100 

library(dplyr)
years <- 2010:2020
id <- rep(1:n, each = length(years))
year <- rep(years, n)
my_data <- data.frame(id, year)

# randomly remove some years for each id
my_data <- my_data[sample(nrow(my_data), 0.7*nrow(my_data)), ]

# add var1 and var2 as random 0/1 variables
my_data$var1 <- sample(c(0, 1), nrow(my_data), replace = TRUE)
my_data$var2 <- sample(c(0, 1), nrow(my_data), replace = TRUE)

my_data = my_data %>% arrange(id, year)

I am using the following code (answer provided here: Counting Number of Unique Column Values Per Group) to find out the number of times each combination of years appear:

agg <- aggregate(year ~ id, my_data, paste, collapse = ", ")
final = as.data.frame(table(agg$year))

                                                              Var1 Freq
1 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020    1
2             2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2019    2
3       2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2019, 2020    1
4             2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2020    1
5                   2010, 2011, 2012, 2013, 2014, 2015, 2016, 2019    1
6       2010, 2011, 2012, 2013, 2014, 2016, 2017, 2018, 2019, 2020    1

My Question: Now, I want to add information about var1 and var2 to this table - that is, for example: how many times does 2010,2011,2012 appear when (var1 = 1 & var2 =1), when (var1 = 0, & var2 = 0), etc.

Here is the approach I am using:

agg <- aggregate(year ~ id + var1 + var2, my_data, paste, collapse = ", ")

final = as.data.frame(table(agg$year, agg$var1, agg$var2))

colnames(final) = c("year", "var1", "var2", "Freq")

Can someone please tell me if this is correct?

Thanks!

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • 2
    You should know what you want better than us. If you can't look at your sample data and the result that prints out and tell if it is correct, then your sample data isn't good. Create sample data so that you know what the result should be, and then test your code to see if you get the expected result. – Gregor Thomas Jul 06 '23 at 18:00

0 Answers0