1

Is there a way to select a subgraph/subset where clusters have maximum number of vertices?

Essentially I want to do something like:

want <- components(X)$csize < 20  

I thought about merging the cluster id's from the graph data frame to the node df, then using a count, or something similar, to subset the original df and compute the graph data frame again.

williamg15
  • 77
  • 7
  • 2
    Right now I can only answer with "yes". An answer to the obvious follow-up question "how?" requires a reproducible example. I would appreciate if you could provide one. Please refer to this FAQ: https://stackoverflow.com/a/5963610/1412059 – Roland Oct 18 '18 at 09:12
  • 1
    Welcome to Stack Overflow! Other than the fact that the function is named `components` (not `component`) your line of code should work fine. What do you need that your code is not already doing? – G5W Oct 18 '18 at 12:28
  • The above code returns a logical. I figured it out I think: X[X$csize < 20] – williamg15 Oct 18 '18 at 13:58

1 Answers1

1

Here is a potential solution using a random graph. You will need to use groups on the components to identify which nodes belong to which components, then you will need to use length to identify how big the components are:

set.seed(4321)
g <- sample_gnm(100, 40, F, F)
plot(g, vertex.size = 5, vertex.label = '')

The entire graph with all components

want <- g %>%
  components %>%
  groups %>%
  .[sapply(., length) > 3]

want will provide the following:

$`1`
[1]  1 34 38 45 75

$`3`
 [1]   3  12  24  39  50  54  58  60  67  84  97  99 100

$`5`
[1]  5 35 37 41 44 53 65 90

Then you can remove all nodes that aren't included in want

newG <- g %>%
  {. - V(.)[! as.numeric(V(.)) %in% unlist(want)]}

plot(newG, vertex.size = 5, vertex.label = '')

enter image description here

struggles
  • 825
  • 5
  • 10