R: How to get common counts (frequency) of levels of two factor variables by ID Variable (as new data frame)

Question

To get the question clear, let me start with one baby example of my data frame.

ID <- c(rep("first", 2), rep("second", 4), rep("third",1), rep("fourth", 3))
Var_1 <- c(rep("A",2), rep("B", 2), rep("A",3), rep("B", 2), "A")
Var_2 <- c(rep("C",2), rep("D",3) , rep("C",2),  rep("E",2), "D")

DF <- data.frame(ID, Var_1, Var_2)

> DF
       ID  Var_1 Var_2
1   first     A     C
2   first     A     C
3  second     B     D
4  second     B     D
5  second     A     D
6  second     A     C
7   third     A     C
8  fourth     B     E
9  fourth     B     E
10 fourth     A     D

There is one ID factor variable and two factor variables Var_1 with R=2 factor levels and Var_2 with C=3 factor levels.

I would like to get a new data frame with (RxC)+1=(2x3)+1 Variables with the frequencies of all combinations of factor levels - separately for each level in ID Variable, that would look like this:

      ID   A.C  A.D  A.E  B.C  B.D  B.E
1  first    2    0    0    0    0    0
2 second    1    1    0    0    2    0
3  third    1    0    0    0    0    0
4 fourth    0    1    0    0    0    2

I tried a couple of functions, but results were not even close to this, so they are not even worth of mentioning. In original data frame I should get (6x9)+1=55 Variables.

EDIT: There are solutions for counting factor levels for one or many variables separatly, but I couldn´t figure it out how to make a common counts for combinations of factor levels for two (or more) variables. Implementig the solution to others seems easy now when I got the answers, but I could not get there by myself.

^^Add a parenthesis at the end. – Paweł Sopel Feb 01 '16 at 13:14 — Paweł Sopel, Feb 01 '16 at 13:14

score 1 · Answer 1 · answered Feb 01 '16 at 13:16

We could use paste to create a variable combining Var_1 and Var_2 and then produce a contingency table with ID and the new variable:

table(DF$ID,paste(DF$Var_1,DF$Var_2,sep="."))

output

         A.C A.D B.D B.E
  first    2   0   0   0
  fourth   0   1   0   2
  second   1   1   2   0
  third    1   0   0   0

To order the table rows, we would need to factor(DF$ID,levels=c("first","second","third","fourth")) beforehand.

score 1 · Answer 2 · answered Feb 01 '16 at 13:16

Try

library(tidyr)
library(dplyr)

DF %>%
  unite(Var, Var_1, Var_2, sep = ".") %>%
  count(ID, Var) %>%
  spread(Var, n, fill = 0)

Which gives:

#Source: local data frame [4 x 5]
#
#      ID   A.C   A.D   B.D   B.E
#  (fctr) (dbl) (dbl) (dbl) (dbl)
#1  first     2     0     0     0
#2 fourth     0     1     0     2
#3 second     1     1     2     0
#4  third     1     0     0     0

score 1 · Accepted Answer · answered Feb 01 '16 at 13:23

1

Using the dcast function from the reshape package (or data.table which has an enhanced implementation of the dcast function):

library(reshape2)
dcast(DF, ID ~ paste(Var_1,Var_2,sep="."), fun.aggregate = length)

which gives:

      ID A.C A.D B.D B.E
1  first   2   0   0   0
2 fourth   0   1   0   2
3 second   1   1   2   0
4  third   1   0   0   0

answered Feb 01 '16 at 13:23

Jaap

81,064
34
182
193

The best solution for my problem here, because it results directly with the data frame. – Peky84 Feb 01 '16 at 13:44
2

Thanx. But I have to mention that the method of @StevenBeaupré will return a dataframe as well. – Jaap Feb 01 '16 at 13:47
True, I missed that. Sorry StevenBeaupré. His method though seems more individual, which is verry nice and even better choice, if you came yourself to the answer. I still prefer your proposal, because it´s a common used function. – Peky84 Feb 01 '16 at 14:01

R: How to get common counts (frequency) of levels of two factor variables by ID Variable (as new data frame)

3 Answers3

Linked