How to remove rows that represent permutation of other's in a given matrix

Question

Assume we have some matrix r such that :

    m=structure(c(1, 0.996805114543033, 0.987281571590291, 0.971610767189123, 
    0.950088633802627, 0.996805114543033, 0.993620436379149, 0.984127320055285, 
    0.996805114543033, 1, 0.996805114543033, 0.987281571590291, 0.971610767189123, 
    0.993620436379149, 0.996805114543033, 0.993620436379149, 0.987281571590291, 
    0.996805114543033, 1, 0.996805114543033, 0.987281571590291, 0.984127320055285, 
    0.993620436379149, 0.996805114543033, 0.971610767189123, 0.987281571590291, 
    0.996805114543033, 1, 0.996805114543033, 0.968506582079198, 0.984127320055285, 
    0.993620436379149, 0.950088633802627, 0.971610767189123, 0.987281571590291, 
    0.996805114543033, 1, 0.947053209443661, 0.968506582079198, 0.984127320055285, 
    0.996805114543033, 0.993620436379149, 0.984127320055285, 0.968506582079198, 
    0.947053209443661, 1, 0.996805114543033, 0.987281571590291, 0.993620436379149, 
    0.996805114543033, 0.993620436379149, 0.984127320055285, 0.968506582079198, 
    0.996805114543033, 1, 0.996805114543033, 0.984127320055285, 0.993620436379149, 
    0.996805114543033, 0.993620436379149, 0.984127320055285, 0.987281571590291, 
    0.996805114543033, 1), .Dim = c(8L, 8L))
    
    k=5
    
    inds <- which(`dim<-`(m %in% head(sort(c(m)), k), dim(m)), arr.ind = TRUE)
    
    r=inds[order(m[inds]), ]
    
    print(r)
    
          row col
    [1,]   6   5
    [2,]   5   6
    [3,]   5   1
    [4,]   1   5
    [5,]   6   4
    [6,]   7   5
    [7,]   4   6
    [8,]   5   7
    
    dput(r)

    structure(c(6L, 5L, 5L, 1L, 6L, 7L, 4L, 5L, 5L, 6L, 1L, 5L, 4L, 
    5L, 6L, 7L), .Dim = c(8L, 2L), .Dimnames = list(NULL, c("row", 
    "col")))

I'm searching to drop duplicated rows of the r matrix. The duplicated rows in this context are rows that represent a permutation of other's. For example : row1=c(6,5) & row2=c(5,6) are duplicated => so I need to remove one of them.

Thank you for help !

score 4 · Answer 1 · answered Apr 18 '21 at 22:11

4

Here is one option using igraph package

r %>%
  graph_from_data_frame(directed = FALSE) %>%
  simplify() %>%
  get.data.frame()

which gives

answered Apr 18 '21 at 22:11

ThomasIsCoding

96,636
9
24
81

akrun · Accepted Answer · 2021-04-18T22:14:49.973

3

We can loop over the rows with apply (MARGIN = 1), sort the elements, get the transpose, then use duplicated on those data to remove the duplicate elements from r

r1 <- r[!duplicated(t(apply(r, 1, sort))),]

-output

r1
#     row col
#[1,]   6   5
#[2,]   5   1
#[3,]   6   4
#[4,]   7   5

Or using pmin/pmax in a vectorized way

r[!duplicated(cbind(pmin(r[,1], r[,2]), pmax(r[,1], r[,2]))),]

edited Apr 18 '21 at 22:14

answered Apr 18 '21 at 22:07

akrun

874,273
37
540
662

Thank you @akrun . I didn't understand the second solution. `pmin/pmax` will work for matrices with a number of columns greater than 2 ? – Tou Mou Apr 18 '21 at 22:20
1

@TouMou For that you may need some changes in the code ie. `do.call` and convert to data.frame – akrun Apr 18 '21 at 22:21

How to remove rows that represent permutation of other's in a given matrix

2 Answers2