4

For example if we have a graph 1-2-3 and delete the vertex 2, then the graph will be 1-3. I have a huge graph with 10000000+ vertices, so I can't delete and create all of them by hand. When I use delete.vertices(g, verticesToDelete) it automatically deletes the edges that they had with their neighbors. Let's say we have a graph of the stackoverflow users and badges, where an edge means that a user has that badge. I want to have edges between all the users that have that badge. Below is a code sample :

users <- c(1,2,3,4,5,6,7,8)
badges <- c('Teacher','Teacher','Teacher','Student','Student','Student','Popular Question','Popular Question')
edgeList <- data.frame(users,badges)

library(igraph)
g <- graph_from_data_frame(edgeList,directed = FALSE)
plot(g)
verticesToDelete <- c('Teacher','Student','Popular Question')
g2 <- delete.vertices(g, verticesToDelete)
plot(g2)

# I want the graph to be like the one below after the deletions

users1 <- c(1,1,2,4,4,5,7)
users2 <- c(2,3,3,5,6,6,8)
edgeList2 <- data.frame(users1,users2)
g3 <- graph_from_data_frame(edgeList2,directed = FALSE)
plot(g3)
Mitsos
  • 61
  • 1
  • 7
  • Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Mar 12 '19 at 05:10
  • @Tung I added a sample of my data and some code. – Mitsos Mar 12 '19 at 05:48
  • Old question I know, but can you `?connect` them all together or does that fail with a very large graph? – thelatemail Jul 24 '20 at 05:01

1 Answers1

0

How about this?

edgeList <- data.frame(users,badges)

edgeList_badges <- merge(edgeList,edgeList,by="badges",
                         all=T)

edgeList_badges$badges <- NULL
edgeList_badges <-edgeList_badges %>% filter(users.x!=users.y)

edgeList_badges<-edgeList_badges[!duplicated(t(apply(edgeList_badges[1:2], 1, sort))), ]


g4 <- graph_from_data_frame(edgeList_badges,directed = FALSE)
plot(g4)
  1. You merge table edgeList with itself by badge to get all combinations of users with the same badge
  2. Delete column badge: we do not need it
  3. Delete relation of users with themselves
  4. Delete permutation of users: if there is a link between 1 and 2, I do not need a link between 2 and 1 (this will solve point 3, also)
  5. Enjoy your graph (if this was the graph you asked for...)

Here is another option

library(DescTools)
edgeList <- data.frame(users,badges)

combSetTmp <- list()
for(badge in 1:length(verticesToDelete)){
  tmp <- edgeList %>% filter(badges==verticesToDelete[badge]) %>% select(users)
  combSetTmp[[badge]] <- CombSet(tmp$users,2)

}

combSet <- do.call(rbind, combSetTmp)

g4 <- graph_from_edgelist(combSet,directed = FALSE)
plot(g4)
  1. We filter users having the same badge
  2. Create all sets of those users
  3. Join all sets
  4. Draw the graph

It should be more "memory-friendly"

LocoGris
  • 4,432
  • 3
  • 15
  • 30
  • It works for this small sample but the original dataframe has millions of rows and the merged dataframe will have more than 2^31 -1 rows so it returns a merge error.[link](https://stackoverflow.com/questions/42479854/merge-error-negative-length-vectors-are-not-allowed) – Mitsos Mar 12 '19 at 07:07
  • How about my second guess? – LocoGris Mar 12 '19 at 07:24
  • I get the following error : Error in inds_combine(.vars, ind_list) : Position must be between 0 and n – Mitsos Mar 12 '19 at 08:03
  • It looks like the data it is too big. Can you plot in several parts? – LocoGris Mar 12 '19 at 09:00