0

I have a network table saved as csv file(data frame) looking like this:

a b 1
b a 3
a c 2
a d 2
c a 2

I want to save the repeated pair of value, in this case

a b 1
b a 3

should be saved as following:

a b
a c

Other values should be omitted. How can I achieve this in R? Thanks in advance!

updated: My file is also really large (about 100MB, probably 70 thousand rows), so I need a solution that can run fast. I try to sort first then check duplicate, but it is too slow.

Here is my code:

ud <- function(df){
  df[1:2] <- t( apply(df[1:2], 1, sort) )
  out <- df[duplicated(df[1:2]),]
  out[3] <- NULL
  write.table(out, file="D:/out.txt", sep=" ", row.names=FALSE, col.names=FALSE)
}
Community
  • 1
  • 1
dexhunter
  • 578
  • 8
  • 24
  • For your information the third column are the values of interaction between col1 and col2, also can be seen as edges. – dexhunter Jul 17 '16 at 10:09
  • My file is also really large (about 100MB, probably 70 thousand rows), so I also need a fast solution.. Thanks. – dexhunter Jul 17 '16 at 10:16
  • I provided a solution for http://stackoverflow.com/questions/29170099/remove-duplicate-column-pairs-sort-rows-based-on-2-columns that is more efficient for large data. – Martin Morgan Jul 17 '16 at 15:25

0 Answers0