4

I would like to obtain the sequence of nodes along the shortest path between two nodes using tidygraph. Consider this example.

library(tidygraph)
library(tidyverse)
demo_netw <- tbl_graph(nodes = tibble(node_id = c("A", "B", "C", "D")),
                       edges = tribble(~from, ~to,
                                       "B", "A",
                                       "D", "C",
                                       "A", "D"))
shortest_path_from_B_to_C <-
  demo_netw %>%
  convert(to_shortest_path, node_id == "B", node_id == "C")
shortest_path_from_B_to_C

## # A tbl_graph: 4 nodes and 3 edges
## #
## # A rooted tree
## #
## # Node Data: 4 x 2 (active)
##   node_id .tidygraph_node_index
##   <chr>                   <int>
## 1 A                           1
## 2 B                           2
## 3 C                           3
## 4 D                           4
## #
## # Edge Data: 3 x 3
##    from    to .tidygraph_edge_index
##   <int> <int>                 <int>
## 1     2     1                     1
## 2     4     3                     2
## 3     1     4                     3

The output shows that the nodes A, B, C, and D are on the shortest path, but it does not show that the sequence of nodes is B -> A -> D -> C. The returned edge data does not reveal the sequence of the edges either.

I am aware that I can accomplish such tasks with igraph.

library(igraph)
demo_igraph <-
  demo_netw %>%
  activate(edges) %>%
  as_tibble() %>%
  graph_from_data_frame()

# We cannot easily access the node_id column, so we must manually make the
# mapping "B" -> "2", "C" -> "3"
shortest_paths(demo_igraph, "2", "3")$vpath

## [[1]]
## + 4/4 vertices, named, from a854191:
## [1] 2 1 4 3

However, this is inelegant for several reasons.

  • I am looking for a tidygraph solution that does not resort to other packages.
  • When exporting the tidygraph edge data, the information contained in the node data column node_id is lost, so I must either manually make the mapping "B" -> "2", "C" -> "3" or write much more elaborate code to join the information from the node and edge data.
  • I would like the output to be "B" "A" "D" "C", not 2 1 4 3.

Is there some straightforward way to obtain the sequence of nodes along the shortest path directly with tidygraph?

1 Answers1

4

EDIT: It is possible to use any name as the node_key parameter, which will result in a successful construction of a tbl_graph. However, passing this onto igraph functions works only when the column is called name within the node data. This might be an issue to report to tidygraph.

It is possible to do this directly with tidygraph, by using igraph functions, considering the following:

  1. A tbl_graph object subclasses igraph so there is no need to convert your data to a tibble and then to an igraph from a data frame, you can directly run igraph functions to a tbl_graph object.
  2. It is possible to set a node_key argument when constructing your graph. This is passed onto igraph and therefore stored within its attributes. HOWEVER, using node_id as you did in your example will not work, as igraph uses this same name internally for the node indices, and therefore will be somehow overwritten. So if you call the column of your node keys different than "node_id" you can then set this as your node_key parameter.
  3. According to tidygraph it should be possible to pass a column name as a node_key, see here.

node_key The name of the column in nodes that character represented to and from columns should be matched against. If NA the first column is always chosen. This setting has no effect if to and from are given as integers.

If the column with the IDs is called name, then this is also recognized by igraph, and will return named paths when calling a shortest_paths() function. However this seems to fail when passing any other node column as a node_key, so for this example we can call the column name.

See below your same code with these few modifications during construction, and the output you asked for.

library(tidygraph)
library(tidyverse)
library(igraph)

demo_netw <- tbl_graph(nodes = tibble(name = c("A", "B", "C", "D")),
                       edges = tribble(~from, ~to,
                                       "B", "A",
                                       "D", "C",
                                       "A", "D"))

shortest_paths(demo_netw, "B", "C")$vpath
#> [[1]]
#> + 4/4 vertices, named, from e00d5b2:
#> [1] B A D C
loreabad6
  • 195
  • 1
  • 7
  • Thank you! My confusion was caused by igraph not associating the edges with `node_id`. Because `node_id` behaves unexpectedly, it would be helpful if `tbl_graph()` could issue a warning to change the column name. It would also be nice if `convert(to_shortest_path)` could return the order of the nodes and edges as additional columns in the node data and edge data, respectively. Then one could keep the code completely tidygraph-based. Just one small comment: the argument `node_key` can be omitted in this example. It appears as if the first column name is by default chosen as `node_key`. – Michael Gastner Dec 31 '20 at 11:20
  • @MichaelGastner thank you for the remark. I checked and indeed `node_key` can be omitted because `tidygraph` defaults to `node_key = 'name'`. If set to `NA` it uses the first node column ([see here](https://tidygraph.data-imaginist.com/reference/tbl_graph.html)), i.e. the network construction should work with the column `node_id` and `node_key = NA`, but it doesn't. Strangely, to add any other column as a `node_key` there has to be a `name` column present, and this is the actual reason your code did not work. This might be an issue worth reporting. I will update my answer to reflect this. – loreabad6 Jan 04 '21 at 17:09