How to count occurences in several variables by different criteria and grouped by two variables

Question

Uing R, I want to count the number of occurences in two variables by two other variables; IDS and year. One of the variables counted need to be counted by unique value. I have really looked around for an answer to this but I cannot seem to find it. I have a dataset like this (though including many more variables):

IDS = c(1,1,1,1,1,1,2,2) 
year = c(1,1,1,1,1,2,1,1) 
x = c(5, 5, 5, 10, 2, NA, 3, 3)
y = c(1, 2, 4, 0, NA, 2, 0, NA)
dfxy = data.frame(IDS, year, x, y)

dfxy
   IDS year   x  y
1   1    1    5  1
2   1    1    5  2
3   1    1    5  4
4   1    1   10  0
5   1    1    2 NA
6   1    2   NA  2
7   2    1    3  0
8   2    1    3 NA

I want a count of the number of occurences in the two columns x and y by each IDS and each year. The count in x needs to be by unique value of x. I want an output like this:

   IDS year x y
1   1    1  3 4
2   1    2  0 1
3   2    1  1 1

It is similar to the answer with cbind in

Aggregate / summarize multiple variables per group (i.e. sum, mean, etc)

which for me would look like

aggregate(cbind(x, y)~IDS+year, data=dfxy, ???)

NA counts as no occurence, any number counts as an occurence in y, in x each unique occurence must be counted (as long as it is not NA). There are no rows with NA in both x and y. I have tried using length instead of sum, but this only seem to summarize the number of rows equally for both x and y.

Ideas or a link I can find an answer to this in? Thanks

I guess the 2nd value for 'x' is 0 as there are no non-NA elements — akrun, Dec 04 '16 at 19:58

score 0 · Answer 1 · answered Dec 04 '16 at 19:56

0

We can try with dplyr

library(dplyr)
dfxy %>% 
     group_by(IDS, year) %>%
     summarise_each(funs(sum(!is.na(.))))

answered Dec 04 '16 at 19:56

akrun

874,273
37
540
662

1

Thanks @akrun, worked perfectly! But see my updated question, because I forgot something (in a minute) – Lisarv Dec 04 '16 at 20:06
So for the previous question, where the varible x did NOT need to be counted by unique values, but instead just in a similar way as y, your answer was perfect @akrun. My mistake that I discovered the difference in my data set late. – Lisarv Dec 04 '16 at 20:17
1

Sorry @akrun, I asked someone else who told me to do so, but I see your point and will remember it for next time if it happens! – Lisarv Dec 05 '16 at 12:52

alistaire · Accepted Answer · 2016-12-04T20:22:16.647

In aggregate, you need to specify the na.action parameter, as with the formula interface it defaults to na.omit, which will exclude most of your data:

aggregate(cbind(x, y) ~ IDS + year, dfxy, 
          FUN = function(x){sum(!is.na(x))}, na.action = na.pass)
##   IDS year x y
## 1   1    1 3 3
## 2   2    1 1 1
## 3   1    2 0 1

For the new question, add unique:

aggregate(cbind(x, y) ~ IDS + year, df, 
          FUN = function(x){sum(!is.na(unique(x)))}, na.action = na.pass)
##   IDS year x y
## 1   1    1 3 4
## 2   2    1 1 1
## 3   1    2 0 1

or

aggregate(cbind(x, y) ~ IDS + year, df, 
          FUN = function(x){length(unique(na.omit(x)))}, na.action = na.pass)
##   IDS year x y
## 1   1    1 3 4
## 2   2    1 1 1
## 3   1    2 0 1

Question changed slightly, see above, did not see your answer in time @alistaire, sorry for that. — Lisarv, Dec 04 '16 at 20:18

How to count occurences in several variables by different criteria and grouped by two variables

2 Answers2