-1

I have what I know must be a simple answer but I can't seem to figure it out.

Suppose I have a dataset:

id <- c(1,1,1,2,2,3,3,4,4)
visit <- c("A", "B", "C", "A", "B", "A", "C", "A", "B")
test <- c(12,16, NA, 11, 15,NA, 0,12, 5)

df <- data.frame(id,visit,test)

And I want to know the number of data points per visit so that the final output looks something like this:

visit   test
A       3
B       3
C       1

How would I go about doing this? I've tried using table

table(df$visit, df$test)

but I get a full grid of the number of values present the combination of visits and test values.

I can sum each row by doing this:

sum(table(df$visit, df$test))[1,]
sum(table(df$visit, df$test))[2,]
sum(table(df$visit, df$test))[3,]

But I feel like there is an easier way and I'm missing it! Any help would be greatly appreciated!

Sheila
  • 2,438
  • 7
  • 28
  • 37
  • Also: `table(df[!is.na(df$test),"visit"])` should work I think. To get as a data.frame just use `data.frame(table(df[!is.na(df$test),"visit"]))` – Mike H. May 04 '17 at 21:41

2 Answers2

1

aggregate of base R would be ideal for this. Group id by visit and count the length. Remove the rows with NA using !is.na() prior to determining the length

aggregate(x = df$id[!is.na(df$test)], by = list(df$visit[!is.na(df$test)]), FUN = length)
#  Group.1 x
#1       A 3
#2       B 3
#3       C 1
d.b
  • 32,245
  • 6
  • 36
  • 77
  • Thanks! This was really helpful! – Sheila May 04 '17 at 21:36
  • 2
    Just `aggregate(test ~ visit, df, length)` ? – David Arenburg May 04 '17 at 21:38
  • If you're particularly enamoured with the standard interface to aggregate - `aggregate(!is.na(df["test"]), df["visit"], FUN=sum)` might be simpler. if you select via `df["col"]` too, you keep the variable names in the output. But David's use of the formula interface and it's default `na.action` is neat. – thelatemail May 04 '17 at 22:06
0

How about:

data.frame(rowSums(table(df$visit, df$test)))
989
  • 12,579
  • 5
  • 31
  • 53