2
x <- c(1,1,1,2,3,3,4,4,4,5,6,6,6,6,6,7,7,8,8,8,8)
y <- c('A','A','C','A','B','B','A','C','C','B','A','A','C','C','B','A','C','A','A','A','B')
X <- data.frame(x,y)

Above I have a data frame where I want to identify the duplicates in vector x, while counting the number of duplicate instances for both (x,y).... For example I have found that ddply and this post here is similar to what I am looking for (Find how many times duplicated rows repeat in R data frame).

library(ddply)
ddply(X,.(x,y), nrow)

This counts the number of instances 1 - A occurs which is 2 times... However I am looking for R to return the unique identifier in vector x with the counted number of times that x matches in column y (getting rid of vector y if necessary), like below..

x  A  B  C
1  2  0  1
2  1  0  0
3  0  2  0
4  1  0  2
5  0  1  0
6  2  1  2 

Any help will be appreciated, thanks

Community
  • 1
  • 1
boothtp
  • 311
  • 4
  • 14

2 Answers2

7

You just need the table function :)

> table(X)
   y
x   A B C
  1 2 0 1
  2 1 0 0
  3 0 2 0
  4 1 0 2
  5 0 1 0
  6 2 1 2
  7 1 0 1
  8 3 1 0
Julien Navarre
  • 7,653
  • 3
  • 42
  • 69
3

This is fairly straightforward by casting your data.frame.

require(reshape2)
dcast(X, x ~ y, fun.aggregate=length)

Or if you'd want things to be faster (say working on large data), then you can use the newly implemented dcast.data.table function from data.table package:

require(data.table) ## >= 1.9.0
setDT(X)            ## convert data.frame to data.table by reference
dcast.data.table(X, x ~ y, fun.aggregate=length)

Both result in:

   x A B C
1: 1 2 0 1
2: 2 1 0 0
3: 3 0 2 0
4: 4 1 0 2
5: 5 0 1 0
6: 6 2 1 2
7: 7 1 0 1
8: 8 3 1 0
Arun
  • 116,683
  • 26
  • 284
  • 387