Getting the set of nodes connected till the main parent node in R

Question

I have a data set which has 6 rows and 3 columns. The first column represents children, whereas second column onward immediate parents of the corresponding child is allocated.

Above, one can see that "a" and "b" don't have any parents. whereas "c" has only parent and that is "a". "d" has parents "b" and "c" and so on.

What I need is: if given the input as the child, it should give me all the ancestors of that child including child.

e.g. "f" is the child I chose then desired output should be : {"f", "d", "b"}, {"f", "d", "c", "a"}, {"f", "e", "b"}, {"f", "e", "c", "a"}.

Note: Order of the nodes does not matter.

Thank you so much in advance.

It would help if you gave us code with data - http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Bulat, May 24 '16 at 06:47
@Bulat, I am asking for the code. And that image file is the data. — Artiga, May 24 '16 at 06:51
data for reproducible example would be : `d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), "p1" = c(NA, NA, "a", "b", "b", "d"), "p2" = c(NA, NA, NA, "c", "c", "e")))` — Bulat, May 24 '16 at 06:56
@Bulat, yes. actually I didn't think it that way as I was importing it from excel. — Artiga, May 24 '16 at 06:58
You probably want to look into the `igraph` package if you are going to be doing a lot of things with trees and graph networks. — Spacedman, May 24 '16 at 07:42

score 3 · Answer 1 · answered May 24 '16 at 08:09

Create sample data. Note use of stringsAsFactors here, I'm assuming your data are characters and not factors:

> d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), "p1" = c(NA, NA, "a", "b", "b", "d"), "p2" = c(NA, NA, NA, "c", "c", "e")),stringsAsFactors=FALSE)

First tidy it up - make the data long, not wide, with each row being a child-parent pair:

> pairs = subset(reshape2::melt(d,id.vars="c",value.name="parent"), !is.na(parent))[,c("c","parent")]
> pairs
   c parent
3  c      a
4  d      b
5  e      b
6  f      d
10 d      c
11 e      c
12 f      e

Now we can make a graph of the parent-child relationships. This is a directed graph, so plots child-parent as an arrow:

> g = graph.data.frame(pairs)
> plot(g)

Now I'm not sure exactly what you want, but igraph functions can do anything... So for example, here's a search of the graph starting at d from which we can get various bits of information:

> d_search = bfs(g,"d",neimode="out", unreachable=FALSE, order=TRUE, dist=TRUE)

First, which nodes are ancestors of d? Its the ones that can be reached from d via the exhaustive (here, breadth-first) search:

> d_search$order
+ 6/6 vertices, named:
[1] d    c    b    a    <NA> <NA>

Note it includes d as well. Trivial enough to drop from this list. That gives you the set of ancestors of d which is what you asked for.

What is the relationship of those nodes to d?

> d_search$dist
  c   d   e   f   a   b 
  1   0 NaN NaN   2   1

We see that e and f are unreachable, so are not ancestors of d. c and b are direct parents, and a is a grandparent. You can check this from the graph.

You can also get all the paths from any child upwards using functions like shortest_paths and so on.

Bulat · Accepted Answer · 2016-05-24T22:47:20.967

Here is a recursive function that makes all possible family lines:

d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), 
      "p1" = c(NA, NA, "a", "b", "b", "d"), 
      "p2" = c(NA, NA, NA, "c", "c", "e")), stringsAsFactors = F)

# Make data more convenient for the task.
library(reshape2)
dp <-  melt(d, id = c("c"), value.name = "p") 

# Recursive function builds ancestor vectors.
getAncestors <- function(data, x, ancestors = list(x)) {

  parents <- subset(data, c %in% x & !is.na(p), select = c("c", "p"))

  if(nrow(parents) == 0) {
    return(ancestors)
  }

  x.c <- parents$c
  p.c <- parents$p

  ancestors <- lapply(ancestors, function(x) {
    if (is.null(x)) return(NULL)

    # Here we want to repeat ancestor chain for each new parent.
    res <- list()
    matches <- 0
    for (i in 1:nrow(parents)) {
      if (tail(x, 1) == parents[i, ]$c){
       res[[i]] <- c(x, parents[i, ]$p)
       matches <- matches + 1
      }
    }

    if (matches == 0) { # There are no more parents. 
      res[[1]] <- x
    }

    return (res)
  })

  # remove one level of lists.
  ancestors <- unlist(ancestors, recursive = F)

  res <- getAncestors(data, p.c, ancestors)
  return (res)

}

# Demo of results for the lowest level.
res <- getAncestors(dp, "f")
res
#[[1]]
#[1] "f" "d" "b"

#[[2]]
#[1] "f" "d" "c" "a"

#[[3]]
#[1] "f" "e" "b"

#[[4]]
#[1] "f" "e" "c" "a"

You will need to implement this in a similar way through recursion or with a while loop.

can you please tell me where do have to make changes for getting the output. I have been trying ever since you gave me the approach but not able to success. — Artiga, May 24 '16 at 12:19
@Artiga I think I have managed to make it work per your description — Bulat, May 24 '16 at 22:48

Getting the set of nodes connected till the main parent node in R

2 Answers2