Subset by row and column reciprocity

Question

I am having trouble subsetting a data.frame based on reciprocity of values in row and column.

Here is a example df to illustrate the problem:

rater <- c(21, 23, 26, 24)
ratee <- c(24, 21, 23, 21)
rating.data <- data.frame(rater, ratee)

Output:

   rater ratee
1    21    24
2    23    21
3    26    23
4    24    21

I would like to subset this df by only keeping the rows that have reciprocal values.

The resulting subset should look like this:

   rater ratee
1    21    24
4    24    21

Any thoughts would be much appreciated!

You using the word reciprocal wrongly. 1/x is the reciprocal value of x. — Andre Elrico, Sep 05 '17 at 15:36
https://stackoverflow.com/questions/28574006/unique-rows-considering-two-columns-in-r-without-order also check the link — BENY, Sep 05 '17 at 15:46
I have a matrix from using the igraph package, an adjacency matrix. That package can only provide a ratio of the reciprocal values. I am using the language they used for network analysis. — SeekingData, Sep 05 '17 at 15:47

score 9 · Answer 1 · answered Sep 05 '17 at 15:33

9

We could sort by row and then use duplicated

m1 <- t(apply(rating.data, 1, sort))
rating.data[duplicated(m1)|duplicated(m1, fromLast = TRUE),]
#   rater ratee
#1    21    24
#4    24    21

answered Sep 05 '17 at 15:33

akrun

874,273
37
540
662

1

This is great! I have been trying to think of how to use duplicated in for the problem. This works well! Thank you! – SeekingData Sep 05 '17 at 15:39
1

@SeekingData if this answer help you , you can accept it :) – BENY Sep 05 '17 at 15:41
I wish I could accept both. Thank you, Akrun! – SeekingData Sep 05 '17 at 16:01

moodymudskipper · Accepted Answer · 2017-09-05T16:00:12.187

6

Another possibility:

library(dplyr)
rating.data %>% inner_join(.,.,by=c("rater" = "ratee","ratee"="rater"))

Or this, for some reason it's twice faster on your small example (though slower than akrun's solution):

merge(rating.data,setNames(rating.data,rev(names(rating.data))))

to keep the second solution flexible with your additional columns:

merge(rating.data,setNames(rating.data[,c("rater","ratee")],c("ratee","rater")))

edited Sep 05 '17 at 16:00

answered Sep 05 '17 at 15:43

moodymudskipper

46,417
11
121
167

Thanks! This is actually what I was looking for as I did not want to loose the rest of the df. This allows me to specify the columns I have in mind. Thank you for reading my mind! – SeekingData Sep 05 '17 at 15:50
You're welcome, akrun's solution could easily be adapted though, if you just put `rating.data[,c("rater","ratee")]` in the first line, it might be faster too if it's a concern. – moodymudskipper Sep 05 '17 at 15:57

score 2 · Answer 3 · answered Sep 05 '17 at 15:50

library(data.table)
N=10#number of rows 
dt1<-data.table(a=1:N,b=sample(N))#create the data.table that holds the info

dt1[,d:=ifelse(a<b,paste0(a,"_",b),paste0(b,"_",a))]#create unique key per pair  respecting the rule "min_max"
setkey(dt1,d)#setting the key 
dt1[dt1[,.N,d][N!=1],.(a,b)] #keep only the pairs that appear more than once

score 2 · Answer 4 · answered Sep 05 '17 at 16:20

2

You can also use pmin and pmax to assist with grouping and then filter on all groups having more than one entry, i.e.

library(dplyr)

df %>% 
 group_by(grp = paste0(pmin(rater, ratee), pmax(rater, ratee))) %>% 
 filter(n() > 1) %>% 
 ungroup() %>% 
 select(-grp)

which gives,

# A tibble: 2 x 2
  rater ratee
  <dbl> <dbl>
1    21    24
2    24    21

answered Sep 05 '17 at 16:20

Sotos

51,121
6
32
66

1

I thought about the `pmin/pmax` approach. It would be faster – akrun Sep 06 '17 at 04:27

score 1 · Answer 5 · answered Sep 05 '17 at 16:10

Similar in spirit to akrun's method, rbind the data.frame to a copy with the columns reversed. find duplicates, starting from the bottom, which will then return TRUE for the rows in the original data.frame. Subset this logical vector with [seq_len(nrow(rating.data))] to return a vector of the appropriate length to [ in order to subset on the rows.

rating.data[duplicated(rbind(rating.data,
                             unname(unclass(rating.data[2:1]))),
                       fromLast=TRUE)[seq_len(nrow(rating.data))],]
  rater ratee
1    21    24
4    24    21

Subset by row and column reciprocity

5 Answers5