2

Suppose I have a dataset like this:

id <- c(1,1,1,2,2,3,3,4,4)
visit <- c("A", "B", "C", "A", "B", "A", "C", "A", "B")
test1 <- c(12,16, NA, 11, 15,NA, 0,12, 5)
test2 <- c(1,NA, 2, 2, 2,2, NA,NA, NA)
df <- data.frame(id,visit,test1,test2)

I want to know the number of data points per visit PER test so that the final output looks something like this:

visit   test1    test2
A       3        3
B       3        1
C       1        1

I know I can use the aggregate function like this for 1 variable as mentioned on this older post :

 aggregate(x = df$id[!is.na(df$test)], by = list(df$visit[!is.na(df$test)]), FUN = length)

but how would I go about doing this for multiple tests?

Community
  • 1
  • 1
Sheila
  • 2,438
  • 7
  • 28
  • 37

2 Answers2

2

Using table and rowSums in base R:

cols <- 3:4    
sapply(cols, function(i) rowSums(table(df$visit, df[,i]), na.rm = TRUE))

#   [,1] [,2]
#A    3    3
#B    3    1
#C    1    1
989
  • 12,579
  • 5
  • 31
  • 53
  • Hi! That's great however it would still require me to list every test. Do you know of a way to do this with giving something like df[3:4] for example? – Sheila May 09 '17 at 21:58
  • You can set `cols` with the columns of interest. – 989 May 09 '17 at 22:08
2

You can also use data.table which could be useful for a flexible number of columns:

cols <- names(df)[grepl("test",names(df))]
setDT(df)[,lapply(.SD, function(x) sum(!is.na(x))), by = visit, .SDcols = cols]

df
#   visit test1 test2
#1:     A     3     3
#2:     B     3     1
#3:     C     1     1
Mike H.
  • 13,960
  • 2
  • 29
  • 39