-2

I have a large data frame with the columns V1 and V2. It is representing an edgelist. I want to create a third column, COUNT, which counts how many times that exact edge appears. For example, if V1 == 1 and V2 == 2, I want to count how many other times V1 == 1 and V2 == 2, combine them into one row and put the count in a third column.

Data <- data.frame(
    V1 = c(1,1),
    V2 = c(2,2)
)

I've tried something like new = aggregate(V1 ~ V2,data=df,FUN=length) but it's not working for me.

Sam
  • 6,616
  • 8
  • 35
  • 64

2 Answers2

1

...or maybe use data.table:

library(data.table)
df<-data.table(v1=c(1,2,3,4,5,1,2,3,1),v2=c(2,3,4,5,6,2,3,4,3))
df[ , count := .N, by=.(v1,v2)] ; df

   v1 v2 count
1:  1  2     2
2:  2  3     2
3:  3  4     2
4:  4  5     1
5:  5  6     1
6:  1  2     2
7:  2  3     2
8:  3  4     2
9:  1  3     1
JCR
  • 71
  • 7
0

Assuming the structure of data as :

df<-data.frame(v1=c(1,2,3,4,5,1,2,3),v2=c(2,3,4,5,6,2,3,4),stringsAsFactors = FALSE)

> df
  v1 v2
1  1  2
2  2  3
3  3  4
4  4  5
5  5  6
6  1  2
7  2  3
8  3  4

Using ddply function from plyr package to get count of all edge-pairs

df2 <- ddply(df, .(v1,v2), function(df) c(count=nrow(df)))

> df2
  v1 v2 count
1  1  2     2
2  2  3     2
3  3  4     2
4  4  5     1
5  5  6     1
parth
  • 1,571
  • 15
  • 24
  • I've ended up with V1 and V2 after the first step, (df2 <-...), yet V2 is the count and I've lost the actual values for V2. Any ideas? – Sam Sep 18 '17 at 10:47
  • @Sam, unsure regarding V2 overlap, may be due to clash with `v2` colname ... have updated solution to create new column `count` which should fix issue you pointed out – parth Sep 18 '17 at 12:55