I believe this can be solved with igraph
in a similar way as in “recursive” self join in data.table but without the calculation.
The difficulty here is that there are separate graphs for each Item
. My approach is to split the data frame into a list of graphs. There might be more concise solutions which use the type
vertex attribute.
However, the code below creates the expected result:
library(igraph)
library(data.table)
library(magrittr)
lapply(
lapply(split(lctolc, lctolc$Item), function(x) graph.data.frame(x[, 2:3])),
function(x) lapply(
V(x)[degree(x, mode = "in") == 0],
function(s) all_simple_paths(x, from = s,
to = V(x)[degree(x, mode = "out") == 0]) %>%
lapply(
function(y) as.data.table(t(names(y))) %>% setnames(paste0("LC", seq_along(.)))
) %>%
rbindlist(fill = TRUE)
) %>% rbindlist(fill = TRUE)
) %>% rbindlist(fill = TRUE, idcol = "Item")
Item LC1 LC2 LC3 LC4
1: 8T4121 MN12 AB12 BC34 <NA>
2: 8T4121 MW92 WK14 RS11 OY01
3: 8T4121 MW92 WK14 RM11 <NA>
4: AB7651 MW92 RS11 OY01 <NA>
Explanation
The igraph
package is a good choice for questions like this.
However, we need to treat the graph of each Item
separately. This is achieved by splitting the data.frame and creating a list of graphs by
lg <- lapply(split(lctolc, lctolc$Item), function(x) graph.data.frame(x[, 2:3]))
which returns
lg
$`8T4121`
IGRAPH 8eb2bcc DN-- 8 6 --
+ attr: name (v/c)
+ edges from 8eb2bcc (vertex names):
[1] AB12->BC34 MN12->AB12 MW92->WK14 WK14->RM11 WK14->RS11 RS11->OY01
$AB7651
IGRAPH 7cd75e7 DN-- 3 2 --
+ attr: name (v/c)
+ edges from 7cd75e7 (vertex names):
[1] MW92->RS11 RS11->OY01
or, visualised by two separate plots.
lapply(seq_along(lg), function(i) plot(lg[[i]], main = names(lg)[i]))

Now, the function all_simple_paths()
lists simple paths from one source vertex to another vertex or vertices where a path is simple if the vertices are visited once at most. To use the function we need to determine the start nodes and all end nodes. This is achieved by
V(x)[degree(x, mode = "in") == 0] # start nodes
V(x)[degree(x, mode = "out") == 0] # end nodes
The degree()
function returns the number of in-coming or out-going edges, resp.
For our example dataset we get
lapply(lg, function(x) V(x)[degree(x, mode = "in") == 0]) # start nodes
$`8T4121`
+ 2/8 vertices, named, from 8eb2bcc:
[1] MN12 MW92
$AB7651
+ 1/3 vertex, named, from 7cd75e7:
[1] MW92
lapply(lg, function(x) V(x)[degree(x, mode = "out") == 0]) # end nodes
$`8T4121`
+ 3/8 vertices, named, from 8eb2bcc:
[1] BC34 RM11 OY01
$AB7651
+ 1/3 vertex, named, from 7cd75e7:
[1] OY01
Now, we loop through all start nodes of each graph and determine all simple paths. The result is a list, again. For each list item, the node names are extracted and reshaped to a data.table in wide format. The columns are renamed to LC1
, LC2
, etc.
In each step, we get a list of data.tables which are combined by rbindlist()
. The fill
parameter is required as the number of columns may vary. The final call to rbindlist() uses the idcol
parameter to mark the rows which are associated with Item
.
Data
The sample dataset has been amended to include the cases from OP's comments here and here.
library(data.table)
lctolc <- fread("
Item LC ToLC
8T4121 AB12 BC34
8T4121 MN12 AB12
8T4121 MW92 WK14
8T4121 WK14 RM11
8T4121 WK14 RS11
8T4121 RS11 OY01
AB7651 MW92 RS11
AB7651 RS11 OY01",
data.table = FALSE)