0

So I'm trying to do a networkanalysis with igraph in R, but I'm a R newbie.
My Excel database looks like this.. just bigger. Excel database

test<-separate(ID_test, 'Contacts 1', paste("Contacts", 1:20, sep="_"), sep=",", extra="drop")
m <- as.matrix(test)
el <- cbind(m[, 1], c(m[, -1]))
el2<-na.omit(el)
testel<- graph_from_edgelist(el2, directed=FALSE)
plot(testel)

I sepArated the data from Contacts to multiple columns and transformed the data to a matrix so I can create an edgelist with cbind.
After that I deleted every row where an NA is present.
With igraph I could then plot my undirected network.

The Problem is that I have duplicate rows where the IDs are just switched between V1 and V2.
Therefore I have f.e. ID_009 and ID003 twice in my network cause the edgelist looks like this..

edgelist

As you can see the first and the last row are basically the same just with switched values between V1 and V2.

I already tried multiple solutions but none seems to work for me.

el4<-el3[!duplicated(el[c("V1", "V2")]),] #doesnt recognize the right duplicates as in the example above
el4<-el3[!duplicated(paste(pmin(V1, V2), pmax(V1, V2)))] #Error in pmin(V1, V2) : object 'V1' not found
el4<-el3[!duplicated(paste(pmin("V1", "V2")), pmax("V1", "V2"))] # creates values with which i cant create an network
g <- graph_from_edgelist(unique(rbind(el2[, 1:2])), directed = FALSE) #doesnt change anything
plot(g)
Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • 2
    Welcome to SO, MemeBeauftragter! (1) Sample data would be really useful here, please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info for the discussions on `dput`, `data.frame`, and `read.table` to provide unambiguous representative sample (not big) data. (2) Please don't include just images of data, see https://meta.stackoverflow.com/a/285557 (and https://xkcd.com/2116/). (3) From the images of data, it appears that some strings might have embedded spaces, e.g., `"ID_009"` (no-space) vs `"ID_001 "` (trailing space), this would defeat `duplicated`. – r2evans Jan 16 '23 at 17:07
  • Please provide enough code so others can better understand or reproduce the problem. – Community Jan 16 '23 at 20:55
  • 1
    @r2evans Thank you very much! The trailing spaces were the problem. I trimmed my data frame and with the solution below i could eliminate the duplicates. For further problems im also gonna pay attention to your comments about providing better information. – MemeBeauftragter Jan 17 '23 at 09:26

1 Answers1

1

Here an example on how to remove duplicate rows in any order:

el <- data.frame(V1 = paste0("id_00", c(3,rep(4,8), rep(9,3))), 
                 V2 = paste0("id_00", c(9,1,2,3,5,6,7,8,9,1,2,3)))

el
#>        V1     V2
#> 1  id_003 id_009
#> 2  id_004 id_001
#> 3  id_004 id_002
#> 4  id_004 id_003
#> 5  id_004 id_005
#> 6  id_004 id_006
#> 7  id_004 id_007
#> 8  id_004 id_008
#> 9  id_004 id_009
#> 10 id_009 id_001
#> 11 id_009 id_002
#> 12 id_009 id_003
dups <- duplicated(t(apply(el, 1, sort)))

el[!dups, ]
#>        V1     V2
#> 1  id_003 id_009
#> 2  id_004 id_001
#> 3  id_004 id_002
#> 4  id_004 id_003
#> 5  id_004 id_005
#> 6  id_004 id_006
#> 7  id_004 id_007
#> 8  id_004 id_008
#> 9  id_004 id_009
#> 10 id_009 id_001
#> 11 id_009 id_002
Ric
  • 5,362
  • 1
  • 10
  • 23