0

I have a data set which has 6 rows and 3 columns. The first column represents children, whereas second column onward immediate parents of the corresponding child is allocated. enter image description here

Above, one can see that "a" and "b" don't have any parents. whereas "c" has only parent and that is "a". "d" has parents "b" and "c" and so on.

What I need is: if given the input as the child, it should give me all the ancestors of that child including child.

e.g. "f" is the child I chose then desired output should be : {"f", "d", "b"}, {"f", "d", "c", "a"}, {"f", "e", "b"}, {"f", "e", "c", "a"}.

Note: Order of the nodes does not matter.

Thank you so much in advance.

Artiga
  • 776
  • 2
  • 16
  • 37
  • It would help if you gave us code with data - http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Bulat May 24 '16 at 06:47
  • @Bulat, I am asking for the code. And that image file is the data. – Artiga May 24 '16 at 06:51
  • 1
    data for reproducible example would be : `d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), "p1" = c(NA, NA, "a", "b", "b", "d"), "p2" = c(NA, NA, NA, "c", "c", "e")))` – Bulat May 24 '16 at 06:56
  • @Bulat, yes. actually I didn't think it that way as I was importing it from excel. – Artiga May 24 '16 at 06:58
  • 1
    You probably want to look into the `igraph` package if you are going to be doing a lot of things with trees and graph networks. – Spacedman May 24 '16 at 07:42

2 Answers2

3

Create sample data. Note use of stringsAsFactors here, I'm assuming your data are characters and not factors:

> d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), "p1" = c(NA, NA, "a", "b", "b", "d"), "p2" = c(NA, NA, NA, "c", "c", "e")),stringsAsFactors=FALSE)

First tidy it up - make the data long, not wide, with each row being a child-parent pair:

> pairs = subset(reshape2::melt(d,id.vars="c",value.name="parent"), !is.na(parent))[,c("c","parent")]
> pairs
   c parent
3  c      a
4  d      b
5  e      b
6  f      d
10 d      c
11 e      c
12 f      e

Now we can make a graph of the parent-child relationships. This is a directed graph, so plots child-parent as an arrow:

> g = graph.data.frame(pairs)
> plot(g)

enter image description here

Now I'm not sure exactly what you want, but igraph functions can do anything... So for example, here's a search of the graph starting at d from which we can get various bits of information:

> d_search = bfs(g,"d",neimode="out", unreachable=FALSE, order=TRUE, dist=TRUE)

First, which nodes are ancestors of d? Its the ones that can be reached from d via the exhaustive (here, breadth-first) search:

> d_search$order
+ 6/6 vertices, named:
[1] d    c    b    a    <NA> <NA>

Note it includes d as well. Trivial enough to drop from this list. That gives you the set of ancestors of d which is what you asked for.

What is the relationship of those nodes to d?

> d_search$dist
  c   d   e   f   a   b 
  1   0 NaN NaN   2   1

We see that e and f are unreachable, so are not ancestors of d. c and b are direct parents, and a is a grandparent. You can check this from the graph.

You can also get all the paths from any child upwards using functions like shortest_paths and so on.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
1

Here is a recursive function that makes all possible family lines:

d <- data.frame(list("c" = c("a", "b", "c", "d", "e", "f"), 
      "p1" = c(NA, NA, "a", "b", "b", "d"), 
      "p2" = c(NA, NA, NA, "c", "c", "e")), stringsAsFactors = F)

# Make data more convenient for the task.
library(reshape2)
dp <-  melt(d, id = c("c"), value.name = "p") 

# Recursive function builds ancestor vectors.
getAncestors <- function(data, x, ancestors = list(x)) {

  parents <- subset(data, c %in% x & !is.na(p), select = c("c", "p"))

  if(nrow(parents) == 0) {
    return(ancestors)
  }

  x.c <- parents$c
  p.c <- parents$p

  ancestors <- lapply(ancestors, function(x) {
    if (is.null(x)) return(NULL)

    # Here we want to repeat ancestor chain for each new parent.
    res <- list()
    matches <- 0
    for (i in 1:nrow(parents)) {
      if (tail(x, 1) == parents[i, ]$c){
       res[[i]] <- c(x, parents[i, ]$p)
       matches <- matches + 1
      }
    }

    if (matches == 0) { # There are no more parents. 
      res[[1]] <- x
    }

    return (res)
  })

  # remove one level of lists.
  ancestors <- unlist(ancestors, recursive = F)

  res <- getAncestors(data, p.c, ancestors)
  return (res)

}

# Demo of results for the lowest level.
res <- getAncestors(dp, "f")
res
#[[1]]
#[1] "f" "d" "b"

#[[2]]
#[1] "f" "d" "c" "a"

#[[3]]
#[1] "f" "e" "b"

#[[4]]
#[1] "f" "e" "c" "a"

You will need to implement this in a similar way through recursion or with a while loop.

Bulat
  • 6,869
  • 1
  • 29
  • 52
  • can you please tell me where do have to make changes for getting the output. I have been trying ever since you gave me the approach but not able to success. – Artiga May 24 '16 at 12:19
  • 1
    @Artiga I think I have managed to make it work per your description – Bulat May 24 '16 at 22:48
  • This is really elegant solution. Thank you so much. :) – Artiga May 25 '16 at 04:37