3

How do I get the number of actual response patterns in my matrix in R for all the variables? An example would be I have a dataset made of 3 variables a, b, c and three observations.

a<-c(0,1,0)
b<-c(1,0,1)
c<-c(1,0,1)
d<-cbind(a,b,c)

resulting in the matrix

d

which if I'm not wrong has two response patterns: 0,1,1 present in two observations and 1,0,0 present in one observation. Is there a function that can tell me that if I need to calculate this in a much bigger dataset?

Thanks

Marco M
  • 623
  • 1
  • 8
  • 15

4 Answers4

2

Variation on the theme, which allows selection of the columns of interest.

d<-data.frame(d)
d$combined<-with(d, paste(a,b,c))
table(d$combined)

0 1 1 1 0 0 
    2     1

data.frame(table(d$combined))

Var1 Freq
1 0 1 1    2
2 1 0 0    1

See count unique values in R for hints.

Community
  • 1
  • 1
Etienne Low-Décarie
  • 13,063
  • 17
  • 65
  • 87
2

Another possibility would be to use duplicated(). This would obtain simply the number of unique response patterns, not what they are.

length( which( !duplicated(d) ) )

# [1] 2

This looks to be substantially faster than using apply() and/or table().

BenBarnes
  • 19,114
  • 6
  • 56
  • 74
1

You could try:

table(apply(d, 1, paste, collapse=""))
johannes
  • 14,043
  • 5
  • 40
  • 51
  • Nice, though I'd suggest collapsing with a very rarely used character (`merge` uses `\b`) for the odd chance that two different combinations could collapse to the same thing (for example, `1,21` and `12,1`.) – Aaron left Stack Overflow Apr 26 '12 at 15:32
1

What, no ddply solution yet? You plyr fans are getting slow...

> library(plyr)
> d <- as.data.frame(d)
> ddply(d, ~a+b+c, summarize, n=length(c))
  a b c n
1 0 1 1 2
2 1 0 0 1

The formula can also be created using paste for possibly less typing:

f <- paste("~", paste(names(d), collapse="+"))
ddply(d, as.formula(f), summarize, n=length(c))
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • The plyr one is also very neat. Thank you Aaron. I like the fact that it reports the variables on top, you could potentially create a very clear heatmap! Let's say you had an extremely long matrix with hundreds of variables with different and complicated names, how would you tell ddply which are your variables without having to list them? Maybe something like a character vector that concatenates all the variables? Doing it manually may be somewhat time-consuming, or you'd have to resort to Excel. How do you create the a+b+c bit without having to specify the names of the variables? – Marco M May 04 '12 at 10:45