Creating a Random Subgraph from a Starting Node

Question

I have this network graph in R:

set.seed(123)
library(igraph)

# Define a vector of names
names <- c("John", "Alex", "Jason", "Matt", "Tim", "Luke", "Shawn", "Henry", "Steven", "Scott", "Adam", "Jeff", "Connor", "Peter", "Andrew", "Dave", "Daniel", "Benjamin", "Joseph", "Martin")

# Create an empty graph with 20 nodes
g <- make_empty_graph(20)

# Add random edges between nodes to represent friendships
set.seed(123)  # for reproducibility
num_edges <- 40
edge_list <- sample(names, size = num_edges * 2, replace = TRUE)
edge_list <- matrix(edge_list, ncol = 2, byrow = TRUE)
g <- graph_from_edgelist(edge_list, directed = FALSE)

# Set the node names to be the names vector
V(g)$name <- names

# Plot the graph
plot(g, vertex.label.cex = 0.7, vertex.label.color = "black", vertex.label.dist = 2)

My Question: Suppose I start with John - I want to make a random subgraph such that:

The "maximum degree" is = n
The "number of nodes" in the subgraph is = m
All nodes in the subgraph can be traced back to John

Using this previous question R: All Nodes of Degree "N" that can be reached from Some Node, I attempted to solve this problem:

all_distances = as.numeric(max(distances(g, "John")))
max_degree =  as.numeric(sample(0:all_distances, 1))
frame_with_max_degree = data.frame(unlist(ego(g, max_degree, "John")))
number_of_nodes = as.numeric(sample(0:nrow(frame_with_max_degree), 1))

My Question: But from here, I am not sure how to randomly select individual nodes number_of_nodes such that all these nodes are necessarily connected to "John".

For instance - suppose n = 3 and m = 2: I would not want to create a subgraph with "John" , "Jason" and "Steven" - because even though "Jason and Steven" might be within a randomly generated radius, "Jason and Steven" are still not directly connected to "John".

Can someone please show me how to do this?

Thanks!

clp · Answer 1 · 2023-03-13T12:42:11.450

An elementary approach is to perform a dfs() search and calculate the minimum distance to the root. Then remove the vertices in order of distance from the root.

Lazy approach, input from single vertex in graph g.

stopifnot(is_named(g))         # Named graph
n <- 3                         # Number of hubs allowed.
m <- 7                         # Number of vertices to maintain in result graph.
d1 <- distances(g, "John")[1,] # Distances from chosen vertex, e.g. "John" (as vector).
d2 <- d1[which(d1 <= n )]      # Tier 3 connections.

d3 <- sort(sample(d2))         # Shuffle and then sort in ascending order.                                              
d4 <- head(d3,  m)             # Keep first m vertices.

g2 <- induced_subgraph(g, names(d4)) 
plot(g2)
g2

Gives:

IGRAPH 225afed UN-- 7 9 -- 
+ attr: name (v/c)
+ edges from 225afed (vertex names):
[1] John  --Alex   John  --Matt   Jason --Matt   Jason --Matt   John  --Scott 
[6] Matt  --Scott  John  --Andrew John  --Martin Andrew--Martin

Update-1 Instead of calculating the distances directly from g, one could also take:

mst(g), its minimal spanning tree or
subgraph.edges(g, sample_spanning_tree(g)), a random spanning tree.

This is a very nice approach, +1! One more comment: You can use `sample` to shuffle `d1`,e.g., `d2 <- sort(sample(d1))` to create randomness in the vertex names, if you want to randomize the subgraph. — ThomasIsCoding, Mar 11 '23 at 20:13
@ThomasIsCoding, As you suggested I have used `sample` to shuffle `d2`. — clp, Mar 12 '23 at 22:25

ThomasIsCoding · Answer 2 · 2023-03-01T23:12:17.363

1

If I understand your purpose correctly, you want to retain paths departing from a given vertex, e.g., "John", in the sub-graph. I guess there should be some other approaches other than mine (I don't think my solution is efficient enough)

v <- s <- "John"
M <- m - 1
done <- FALSE
repeat {
  for (k in 1:n) {
    nbs <- unlist(ego(g, 1, s, "out", 1))
    s <- setdiff(names(sample(nbs, sample.int(min(length(nbs), M), 1))), v)
    v <- c(v, s)
    M <- M - length(s)
    if (M == 0) {
      done <- TRUE
      break
    }
  }
  if (done) break
}
gs <- subgraph(g, v)

For example, with n <- 3and m <- 7, we can obtain a subgraph like

edited Mar 01 '23 at 23:12

answered Mar 01 '23 at 20:54

ThomasIsCoding

96,636
9
24
81

@ ThomasIsCoding: thank you so much for your answer! Do you have any ideas about this question? https://stackoverflow.com/questions/75674642/correctly-understanding-loop-iterations ? Thank you so much! – stats_noob Mar 14 '23 at 16:18
@stats_noob Sorry that I have no idea about that – ThomasIsCoding Mar 14 '23 at 18:47
I have been working on a question over here - can you please take a look at it if you have time? https://stackoverflow.com/questions/76195071/r-randomly-sampling-groups-of-coin-flips thank you so much! – stats_noob May 11 '23 at 15:25

ThomasIsCoding · Accepted Answer · 2023-03-10T16:06:15.300

1

I think we can try another way around: Starting from make_ego_graph, we can reduce the size of ego network to the desired one by deleting vertices iteratively (but need to check the vertex connectivity all the way).

For example

n <- 3
m <- 7
src <- "John" # assume starting from "John"
gs <- make_ego_graph(g, order = n - 1, nodes = src)[[1]]
repeat{
  if (vcount(gs) == m) break # terminate the loop if we reached the desired size
  v_del <- sample(V(gs)[names(V(gs)) != src], 1) # random vertex to delete
  if (cohesion(gs - v_del)) { #check the connectivity
    gs <- gs - v_del
  }
}

then we have

> gs
IGRAPH fdc5e70 UN-- 7 9 -- 
+ attr: name (v/c)
+ edges from fdc5e70 (vertex names):
[1] John --Alex     Jason--Matt     Alex --Henry    Henry--Benjamin
[5] Alex --Henry    Matt --Adam     Jason--Matt     John --Matt
[9] Henry--Adam

edited Mar 10 '23 at 16:06

answered Mar 10 '23 at 10:29

ThomasIsCoding

96,636
9
24
81

I have been working on this question for a while - can you please take a look at it if you have time? https://stackoverflow.com/questions/75674642/r-correctly-understanding-loop-iterations thank you so much! – stats_noob Mar 11 '23 at 17:09
1

A slightly different approach is to delete the vertex and its dependents, if the resulting component of the source is big enough. `try <- gs - v_del` `if (length(subcomponent(try, src)) >= m ) gs <- try` – clp Mar 13 '23 at 11:11

clp · Answer 4 · 2023-03-15T13:52:28.897

Brute force approach: create all connected combinations and filter.

## Adapted from from combn {utils}
## Generate all combinations of the elements of x taken m at a time
## Nijenhuis, A. and Wilf, H.S. (1978) Combinatorial Algorithms for Computers and Calculators; Academic Press, NY.
## See also De Bruijn sequences, -graphs
##
## Combine and filter
## Exclude combinations not connected in graph, or not connected to starting vertex when not missing.
combn_f<- function (x, m, starting = NULL, graph) 
{
    smissing <- missing(starting)
    stopifnot(length(m) == 1L, is.numeric(m))
    if (m < 0) 
        stop("m < 0", domain = NA)

    n <- length(x)
    if (n < m) 
        stop("n < m", domain = NA)
    m <- as.integer(m)
    e <- 0
    h <- m
    a <- seq_len(m)
 
    r <- x[a]
    out <- list()

    if (m > 0) {
        i <- 0L
        nmmp1 <- n - m + 1L
        repeat {
            sr <- r
            if (!smissing) sr <- c(starting, r)
            if (is_connected(induced_subgraph(graph, V(g)[sr]))) {
              i        <- i + 1L
              out[[i]] <- sr
            }
            if (a[1L] == nmmp1) break          
            
            if (e < n - h) {
                h <- 1L
                e <- a[m]
                j <- 1L
            }
            else {
                e <- a[m - h]
                h <- h + 1L
                j <- 1L:h
            }
            a[m - h + j] <- e + j
            r <- x[a]        
        }
    }
    out
}

## test data
library(igraph)
g <- sample_gnp(21, 0.3)
V(g)$name <- seq_len(vcount(g))            # keep track of the vertices
names     <- V(g)$name

set.seed(20230315)
m   <- 7
system.time({
cmb <- combn_f(names[names != names[1]], m-1, starting = names[1], graph=g)    # from starting vertex
## cmb <- combn_f(names[names != names[1]], m-1, starting = NULL, graph=g)     
})
ix  <- sample(length(cmb),1)               # random choice
ff  <- cmb[ix][[1]]                        # connected friends
gs  <- induced_subgraph(g, ff)
plot(gs)

Brute force approach: create all connected combinations.

m <- 7
cc1 <- combn(names, m)                    # all possible combinations
cc2 <- cc1[, which(cc1 [1,] == names[1])] # starting with "John", first element in names

system.time({
cmb <- list()                             # all connected combinations in subgraph
for ( i in seq(ncol(cc2)) ){
  ppp <- cc2[, i]                         # select combination and test connectivity
  if (is_connected(induced_subgraph(g, V(g)[ppp]))) {
    cmb[[length(cmb) + 1]] <- ppp         # append in line
  }
}
})

##  user  system elapsed 
##  5.09    0.40    5.96 

ix <- sample(length(cmb),1)               # random choice
ff <- cmb[ix][[1]]                        # connected friends
gs <- induced_subgraph(g, ff)
plot(gs)

Allows a condition to be tested and filtered across all possible combinations.
A challenge is to combine and filter at the same time.

@ clp: thank you so much for your answer! I was just wondering - suppose the graph is big and you cant store all combinations. Is there some way around this? — stats_noob, Mar 14 '23 at 16:15
A solution is to copy the source of `combn` and insert the filter code at the appropriate place (line 64/65). Type `>combn` to retrieve the source code. — clp, Mar 14 '23 at 19:36

Creating a Random Subgraph from a Starting Node

4 Answers4

Linked