2

To get the question clear, let me start with one baby example of my data frame.

ID <- c(rep("first", 2), rep("second", 4), rep("third",1), rep("fourth", 3))
Var_1 <- c(rep("A",2), rep("B", 2), rep("A",3), rep("B", 2), "A")
Var_2 <- c(rep("C",2), rep("D",3) , rep("C",2),  rep("E",2), "D")

DF <- data.frame(ID, Var_1, Var_2)

> DF
       ID  Var_1 Var_2
1   first     A     C
2   first     A     C
3  second     B     D
4  second     B     D
5  second     A     D
6  second     A     C
7   third     A     C
8  fourth     B     E
9  fourth     B     E
10 fourth     A     D

There is one ID factor variable and two factor variables Var_1 with R=2 factor levels and Var_2 with C=3 factor levels.

I would like to get a new data frame with (RxC)+1=(2x3)+1 Variables with the frequencies of all combinations of factor levels - separately for each level in ID Variable, that would look like this:

      ID   A.C  A.D  A.E  B.C  B.D  B.E
1  first    2    0    0    0    0    0
2 second    1    1    0    0    2    0
3  third    1    0    0    0    0    0
4 fourth    0    1    0    0    0    2

I tried a couple of functions, but results were not even close to this, so they are not even worth of mentioning. In original data frame I should get (6x9)+1=55 Variables.

EDIT: There are solutions for counting factor levels for one or many variables separatly, but I couldn´t figure it out how to make a common counts for combinations of factor levels for two (or more) variables. Implementig the solution to others seems easy now when I got the answers, but I could not get there by myself.

Peky84
  • 103
  • 7

3 Answers3

1

We could use paste to create a variable combining Var_1 and Var_2 and then produce a contingency table with ID and the new variable:

table(DF$ID,paste(DF$Var_1,DF$Var_2,sep="."))

output

         A.C A.D B.D B.E
  first    2   0   0   0
  fourth   0   1   0   2
  second   1   1   2   0
  third    1   0   0   0

To order the table rows, we would need to factor(DF$ID,levels=c("first","second","third","fourth")) beforehand.

scoa
  • 19,359
  • 5
  • 65
  • 80
1

Try

library(tidyr)
library(dplyr)

DF %>%
  unite(Var, Var_1, Var_2, sep = ".") %>%
  count(ID, Var) %>%
  spread(Var, n, fill = 0)

Which gives:

#Source: local data frame [4 x 5]
#
#      ID   A.C   A.D   B.D   B.E
#  (fctr) (dbl) (dbl) (dbl) (dbl)
#1  first     2     0     0     0
#2 fourth     0     1     0     2
#3 second     1     1     2     0
#4  third     1     0     0     0
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
1

Using the dcast function from the reshape package (or data.table which has an enhanced implementation of the dcast function):

library(reshape2)
dcast(DF, ID ~ paste(Var_1,Var_2,sep="."), fun.aggregate = length)

which gives:

      ID A.C A.D B.D B.E
1  first   2   0   0   0
2 fourth   0   1   0   2
3 second   1   1   2   0
4  third   1   0   0   0
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • The best solution for my problem here, because it results directly with the data frame. – Peky84 Feb 01 '16 at 13:44
  • 2
    Thanx. But I have to mention that the method of @StevenBeaupré will return a dataframe as well. – Jaap Feb 01 '16 at 13:47
  • True, I missed that. Sorry StevenBeaupré. His method though seems more individual, which is verry nice and even better choice, if you came yourself to the answer. I still prefer your proposal, because it´s a common used function. – Peky84 Feb 01 '16 at 14:01