determining frequency of multiple variables based on multiple factors in R

Question

Suppose I have a dataset like this:

id <- c(1,1,1,2,2,3,3,4,4)
visit <- c("A", "B", "C", "A", "B", "A", "C", "A", "B")
test1 <- c(12,16, NA, 11, 15,NA, 0,12, 5)
test2 <- c(1,NA, 2, 2, 2,2, NA,NA, NA)
df <- data.frame(id,visit,test1,test2)

I want to know the number of data points per visit PER test so that the final output looks something like this:

visit   test1    test2
A       3        3
B       3        1
C       1        1

I know I can use the aggregate function like this for 1 variable as mentioned on this older post :

 aggregate(x = df$id[!is.na(df$test)], by = list(df$visit[!is.na(df$test)]), FUN = length)

but how would I go about doing this for multiple tests?

989 · Answer 1 · 2017-05-09T22:02:19.777

2

Using table and rowSums in base R:

cols <- 3:4    
sapply(cols, function(i) rowSums(table(df$visit, df[,i]), na.rm = TRUE))

#   [,1] [,2]
#A    3    3
#B    3    1
#C    1    1

edited May 09 '17 at 22:02

answered May 09 '17 at 21:57

989

12,579
5
31
53

Hi! That's great however it would still require me to list every test. Do you know of a way to do this with giving something like df[3:4] for example? – Sheila May 09 '17 at 21:58
You can set `cols` with the columns of interest. – 989 May 09 '17 at 22:08

score 2 · Accepted Answer · answered May 09 '17 at 21:59

You can also use data.table which could be useful for a flexible number of columns:

cols <- names(df)[grepl("test",names(df))]
setDT(df)[,lapply(.SD, function(x) sum(!is.na(x))), by = visit, .SDcols = cols]

df
#   visit test1 test2
#1:     A     3     3
#2:     B     3     1
#3:     C     1     1

determining frequency of multiple variables based on multiple factors in R

2 Answers2