Frequency of data points by two variables in R

Question

I have what I know must be a simple answer but I can't seem to figure it out.

Suppose I have a dataset:

id <- c(1,1,1,2,2,3,3,4,4)
visit <- c("A", "B", "C", "A", "B", "A", "C", "A", "B")
test <- c(12,16, NA, 11, 15,NA, 0,12, 5)

df <- data.frame(id,visit,test)

And I want to know the number of data points per visit so that the final output looks something like this:

visit   test
A       3
B       3
C       1

How would I go about doing this? I've tried using table

table(df$visit, df$test)

but I get a full grid of the number of values present the combination of visits and test values.

I can sum each row by doing this:

sum(table(df$visit, df$test))[1,]
sum(table(df$visit, df$test))[2,]
sum(table(df$visit, df$test))[3,]

But I feel like there is an easier way and I'm missing it! Any help would be greatly appreciated!

Also: `table(df[!is.na(df$test),"visit"])` should work I think. To get as a data.frame just use `data.frame(table(df[!is.na(df$test),"visit"]))` — Mike H., May 04 '17 at 21:41

score 1 · Accepted Answer · answered May 04 '17 at 21:29

1

aggregate of base R would be ideal for this. Group id by visit and count the length. Remove the rows with NA using !is.na() prior to determining the length

aggregate(x = df$id[!is.na(df$test)], by = list(df$visit[!is.na(df$test)]), FUN = length)
#  Group.1 x
#1       A 3
#2       B 3
#3       C 1

answered May 04 '17 at 21:29

d.b

32,245
6
36
77

Thanks! This was really helpful! – Sheila May 04 '17 at 21:36
2

Just `aggregate(test ~ visit, df, length)` ? – David Arenburg May 04 '17 at 21:38
If you're particularly enamoured with the standard interface to aggregate - `aggregate(!is.na(df["test"]), df["visit"], FUN=sum)` might be simpler. if you select via `df["col"]` too, you keep the variable names in the output. But David's use of the formula interface and it's default `na.action` is neat. – thelatemail May 04 '17 at 22:06

score 0 · Answer 2 · answered May 04 '17 at 21:38

0

How about:

data.frame(rowSums(table(df$visit, df$test)))

answered May 04 '17 at 21:38

989

12,579
5
31
53

Frequency of data points by two variables in R

2 Answers2

Linked