sequence events consistent with a two-column table of connections

Question

I have a table that show connections between events:

library(data.table)
df = data.table(p1 = c("x0", "x0", "x1", "x2", "x3"),
                p2 = c("x1", "x2", "x3", "x3", "x4"))

Here is an illustration:

The next event may happen only if all previous events have already happened. For example, event x3 may happen only after x1 and x2 irrespective of their sequence.

How can I convert the df table into following one (where all events appear in some permissible order), in a data.table way:

df_required = data.table(p = c("x0", "x1", "x2", "x3", "x4", 
                               "x0", "x1", "x2", "x3", "x4"),
                         sequence = c(1, 2, 3, 4, 5, 1, 3, 2, 4, 5),
                         group = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2))

The required table shows two possible groups of connections: x0-x1-x2-x3-x4 and x0-x2-x1-x3-x4. There are two possible ways because two values may immediately follow x0: x1 or x2. The sequence is written above the circles in the illustration as well.

in your source `df`, "`x3`" occurs three times... is this an error? — Wimpel, Oct 12 '18 at 10:14
No, this is not an error. I'll add a picture that illustrate connections in couple of minutes. — Serhii, Oct 12 '18 at 10:17
You definitely should use igraph. Do the paths start and end always at the same respective vertices? Then you could do `library(igraph); lapply(all_shortest_paths(graph_from_data_frame(df), "x0", "x4")$res, as.vector)` and continue from there. — Roland, Oct 12 '18 at 10:30
Ronald, thanks! It gives sequence, all the rest I can do. But data.table way would be better. — Serhii, Oct 12 '18 at 10:33
I doubt you can find a data.table approach that is more efficient than igraph. But of course, you can combine data.table and igraph. It might be possible to improve my suggested approach. I'm not an expert with igraph since I usually don't deal with graphs. — Roland, Oct 12 '18 at 10:41

Andre Elrico · Answer 1 · 2018-10-12T10:51:44.023

2

I just post this since it gives the same output as Rolands suggestion:

(i'll remove it if its non-sense)

data:

library(data.table)
df = data.table(p1 = c("x0", "x0", "x1", "x2", "x3"),
                p2 = c("x1", "x2", "x3", "x3", "x4"))

code:

restElements <- setdiff(df$p1, df$p2)
ans <-
    t(do.call(
        expand.grid, c(restElements, unique(split(df$p2,df$p1)))
        ))

group = rep(1:ncol(ans), each = nrow(ans))

p     = c( ans )

sequence = as.numeric(factor(p))

data.table(p, sequence, group)

result:

#    p sequence group
#1: x0        1     1
#2: x1        2     1
#3: x3        4     1
#4: x4        5     1
#5: x0        1     2
#6: x2        3     2
#7: x3        4     2
#8: x4        5     2

please note:

make sure when you set the factor: factor(p), you get the correct order. (by default factor levels are just sorted. Works with this example, might not work with others.)
Instead of my ans it's probably wiser to use the igraph method.

So you can combine both:

borrowed from @Roland

lvls <- levels(factor(c(df$p1, df$p2)))
library(igraph);
tmp <- lapply(all_shortest_paths(graph_from_data_frame(df), lvls[1], lvls[length(lvls)])$res, as.vector)
ans <- sapply(tmp, function(x) { lvls[x] })

You can use this ans. Make sure you later use: sequence = as.numeric(factor(p, lvls))

edited Oct 12 '18 at 10:51

answered Oct 12 '18 at 10:34

Andre Elrico

10,956
6
50
69

Andre, In group 1 x2 is missing, while in group 2 x1 is missing. Perhaps I can modify your solution to get the required one. Thanks for the idea! – Serhii Oct 12 '18 at 10:42
I dont really understand. When we take your picture. If you go the "top-way" there is no `x2`. If we go the "lower-way" there is no `x1`. – Andre Elrico Oct 12 '18 at 10:47
You are right, some extra explanation should be added. The chart show connections. x0..xn may be treated as events. The next event may happen only if all previous events have already happened. In this way, x3 may happen only after x1 and x2. – Serhii Oct 12 '18 at 11:12
Well in that case you just need to "insert" the missing factor levels on the meaning full spots. – Andre Elrico Oct 12 '18 at 11:15
How can in that logic `x1` "happen" after `x2` ? – Andre Elrico Oct 12 '18 at 11:16
Let me give you an example. Suppose we have two types of fruits: apples and peaches. x1 means apples are eaten, x2 means peaches are eaten. x3 means fruits are eaten. Some may start from any fruit he/she likes. But fruits are eaten (x3) only when apples (x1) and peaches are eaten (x2). – Serhii Oct 12 '18 at 11:22

Frank · Accepted Answer · 2018-10-13T15:00:47.197

You could assign a rank to each node (assuming you have a graph for which this makes any sense)...

vdf = data.table(p = sort(unique(unlist(df[, c("p1", "p2")]))))

i = 0L
vdf[, r := 0L]
while (any(vdf[r == i, p] %in% df$p1)){
  vdf[r == i, r := r + !df[.(p), on=.(p1), p %in% setdiff(p1, p2)]]
  i = i + 1L
}

    p r
1: x0 0
2: x1 1
3: x2 1
4: x3 2
5: x4 3

If there's a unique first event, x0, then thanks to @Roland, here's a simpler way:

library(igraph)
vdf[, r := as.vector(distances(graph_from_data_frame(df), "x0"))]

Then, for each rank having more than one node, take all permutations (here, borrowing from Generating all distinct permutations of a list in R)...

wdf = vdf[, do.call(cbind, lapply(split(.I, r), function(x) as.data.table(
  gtools::permutations(length(x), length(x), x)
)))]

   0.V1 1.V1 1.V2 2.V1 3.V1
1:    1    2    3    4    5
2:    1    3    2    4    5

The values in wdf are row numbers (see ?.I) of vdf, so...

mdf = melt(wdf[, g := .I], id = "g", value.name = "w")[order(g, variable)]
vdf[mdf$w, .(p, g = mdf$g, r)][, seq := rowid(g)][]

     p g r seq
 1: x0 1 0   1
 2: x1 1 1   2
 3: x2 1 1   3
 4: x3 1 2   4
 5: x4 1 3   5
 6: x0 2 0   1
 7: x2 2 1   2
 8: x1 2 1   3
 9: x3 2 2   4
10: x4 2 3   5

So g is the "group" mentioned in the OP; r is the rank; seq is the sequence within the group (useful so that the sorting of the table is explicit).

Comment. I would stop after assigning the rank/depth attribute to each node in vdf. All the information about feasible sequences of events is here, but enumerating them (as in the OP's output) can be very costly, in terms of computational time and space, and so should be avoided if possible.

The number of permutations for events x sharing the same rank is is factorial(length(x)), so for example if x has a length of 10, the matrix returned has dimensions dim(gtools::permutations(10, 10)) = 3628800 x 10. My computer hangs when trying to compute it.

The rank/depth seems like something that should be in igraph, but I couldn't find it. It makes sense that the permutations thing is not in igraph since the OP's desired output is an inefficient way to represent the information already contained in the rank/depth. — Frank, Oct 12 '18 at 16:08
Frank, thank for the comprehensive answer! You mentioned that the desired output is not the best one. From your perspective, what would be the efficient way to represent output? — Serhii, Oct 13 '18 at 08:07
@Sergiy Cool, I've added an explanation of what I meant at the end of the question now. — Frank, Oct 13 '18 at 15:01

sequence events consistent with a two-column table of connections

2 Answers2