Removing the unordered pairs repeated twice in a file in R

Question

I have a file like this in R.

As we can see, there are similar unordered pairs in this ( marked pairs ), like,

 1  0

and

 0  1

I wish to remove these pairs. And I want to count the number of such pairs that I have and append the count in front of the tow that is repeated. If not repeated, then 1 should be written in the third column.

For example ( A sample of the output file )

How can I achieve it in R?

James · Answer 1 · 2014-07-14T18:58:10.477

4

Here is a way using transform, pmin and pmax to reorder the data by row, and then aggregate to provide a count:

# data
x <- data.frame(a=c(rep(0,10),rep(1,10),3),b=c(1:10,0,11:19,0))

#logic
aggregate(count~a+b,transform(x,a=pmin(a,b), b=pmax(a,b), count=1),sum)
   a  b count
1  0  1     2
2  0  2     1
3  0  3     2
4  0  4     1
5  0  5     1
6  0  6     1
7  0  7     1
8  0  8     1
9  0  9     1
10 0 10     1
11 1 11     1
12 1 12     1
13 1 13     1
14 1 14     1
15 1 15     1
16 1 16     1
17 1 17     1
18 1 18     1
19 1 19     1

edited Jul 14 '14 at 18:58

answered Jul 14 '14 at 18:42

James

65,548
14
155
193

1

Give it some "data.table" power with a slight rewrite: `DT[, list(a = pmin(a, b), b = pmax(a, b), count = 1)][, .N, by = list(a, b)]` :-) – A5C1D2H2I1M1N2O1R2T1 Jul 14 '14 at 19:02
1

@AnandaMahto no need for that intermediate step: `DT[, .N, by = list(a = pmin(a, b), b = pmax(a, b))]` – eddi Jul 14 '14 at 19:06
@eddi, even nicer :-) – A5C1D2H2I1M1N2O1R2T1 Jul 14 '14 at 19:07
@AnandaMahto Nice. I struggled with a `dplyr` solution, as it seems there is a conceptual difference with it I didn't understand. I think I have it now. – James Jul 14 '14 at 19:07
This is a cool solution, but it's very special for the two variables case. On the other hand, Ananda's answer does it more generally but does so at the cost of converting to a matrix, which is not great. – eddi Jul 14 '14 at 19:09

score 2 · Answer 2 · answered Jul 14 '14 at 18:39

Here's one approach:

First, create a vector of the columns sorted and then pasted together.

x <- apply(mydf, 1, function(x) paste(sort(x), collapse = " "))

Then, use ave to create the counts you are looking for.

mydf$count <- ave(x, x, FUN = length)

Finally, you can use the "x" vector again, this time to detect and remove duplicated values.

mydf[!duplicated(x), ]
#    V1 V2 count
# 1   0  1     2
# 2   0  2     1
# 3   0  3     2
# 4   0  4     1
# 5   0  5     1
# 6   0  6     1
# 7   0  7     1
# 8   0  8     1
# 9   0  9     1
# 10  0 10     1
# 12  1 11     1
# 13  1 12     1
# 14  1 13     1
# 15  1 14     1
# 16  1 15     1
# 17  1 16     1
# 18  1 17     1
# 19  1 18     1
# 20  1 19     1

Removing the unordered pairs repeated twice in a file in R

2 Answers2

Linked