1

I would like to draw network plots using tidygraph and ggraph.

I have a larger tibble with items connected via from and to. Some of the trees are connected (a0 and b0 in the example).

I would like to:

  1. Count the number of independent trees
  2. Calculate the average maximum edges=connections per independent tree. The average maximum edges should be calculated "downstreams", i.e. from a0 to k2 or a4 not a0 to b0 in the example data.

Example:

library(tidygraph)
library(igraph)
library(ggraph)
library(tidyverse)


# make edges
edges<- tibble(from = c("a0","a1","a2","a3","b0","b1","c0","c1","a2","k1"),
               to = c("a1","a2","a3","a4","b1","a3","c1","c2","k1","k2"))


# makenodes
nodes  <- unique(c(edges$from,edges$to))
tibble(node=nodes,
       label=nodes) -> nodes


# make correct dataframe                 
routes_igraph <- graph_from_data_frame(d = edges,
                                       vertices = nodes,
                                       directed = TRUE)

routes_tidy <- as_tbl_graph(routes_igraph)

#plot network
ggraph(routes_tidy, layout = "tree") + 
  geom_edge_link() + 
  geom_node_point() + 
  theme_graph() +
  geom_node_text(aes(label = label), repel = TRUE)

Created on 2023-04-16 with reprex v2.0.2

Desired output

  1. Number of independent trees of the given edges and nodes: 2

  2. Average maximum edges per independen trees: 3.5, 2

ava
  • 840
  • 5
  • 19

1 Answers1

1

Here is a way. It borrows a function height from this SO post, modified to count "in" vertices.

height <- function(v, g) {
  D <- distances(g, to=v, mode="in")
  max(D[D != Inf])
}

cmp <- components(routes_igraph)
sp <- split(names(cmp$membership), cmp$membership)
sub_tree_list <- lapply(sp, \(v) induced.subgraph(routes_igraph, v))
sub_tree_height <- Map(\(g, v) sapply(v, height, g = g), sub_tree_list, sp)

# number of components
length(sp)
#> [1] 2

# height of each sub-tree
sapply(sub_tree_height, max)
#> 1 2 
#> 4 2

Created on 2023-04-16 with reprex v2.0.2


Edit

To get the maxima per initial node and their averages per sub-tree, this works.

initials_list <- lapply(sp, \(x) x[grep("0", x)])
sub_tree_max_height <- Map(\(g, v) sapply(v, height, g = g), sub_tree_list, initials_list)
sapply(sub_tree_max_height, mean)
#>   1   2 
#> 3.5 2.0

Created on 2023-04-16 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thank you, this is very helpful. What would be 2) average maximum edges per independent tree (i.e. the average maximum edges of all subtrees that form that tree)? In my example it would be 3.5 and 2. – ava Apr 16 '23 at 11:46
  • 1
    Something like this works for me: `nodes %>% filter(str_detect(label,"0")) %>% pull(label) -> initials` `get_average <- function(x){ x %>% filter(ind%in%initials) %>% summarise(mean=mean(values))}` `sub_tree_height %>% map(~stack(.)) %>% map(as_tibble) %>% map(get_average)` – ava Apr 16 '23 at 12:21