How to add index numbers to the nodes of a tree

Question

Suppose I have this data type for representing a tree (a rose tree):

type tree =
    | Function of string * tree list
    | Terminal of int

For example:

Function ("+", [Function ("*", [Terminal 5; Terminal 6]);
                Function ("sqrt", [Terminal 3])])

represents the following tree ((5 * 6) + sqrt(3)):

I want to convert this tree into another tree data structure called an "indexed tree" that contains the depth-first (or breath-first) index of each node. In the image above, I have labelled all nodes with their depth-first index.

This is the data type for indexed trees:

type index = int
type indexed_tree =
    | IFunction of index * string * indexed_tree list
    | ITerminal of index * int

This represents the indexed tree (depth-first) for the image above:

IFunction (0, "+", [IFunction (1, "*", [ITerminal (2, 5); ITerminal (3, 6)]);
                    IFunction (4, "sqrt", [ITerminal (5, 3)])])

This represents the indexed tree (breadth-first) for the image above:

IFunction (0, "+", [IFunction (1, "*", [ITerminal (3, 5); ITerminal 4, 6)]);
                    IFunction (2, "sqrt", [ITerminal (5, 3)])])

Now the problem is: how do I define a function tree -> indexed_tree?

I tried to adapt DFS and BFS techniques of keeping a stack, but I soon realized that this problem is completely different. DFS and BFS are only searching for one item, and they can ignore the rest of the tree. Here, I am trying to label the nodes of a tree with their index numbers. How can I do this?

EDIT

Below is my implementation for getting the subtree rooted at the specified index (both depth-first indexing and breadth-first indexing are implemented). I am unable to see how I could adapt this implementation to convert a given tree into an indexed tree. I tried to make use of counter (see implementation below), but the complication is that a depth-first traversal has to backtrack, and I don't know how to pass the counter around when backtracking.

(* Helper function for subtree_index_dfs and subtree_index_bfs.
 * join_func should either be prepend (for depth-first), or postpend
 * (for breadth-first). *)
let subtree_index tree index join_func =
    let node_children = function
        | Terminal _ -> []
        | Function (_, children) -> children in
    let rec loop counter stack =
        match stack with
        | [] -> failwith "Index out of bounds"
        | (hd::_) when counter = index -> hd
        | (hd::tl) -> loop (counter + 1) (join_func (node_children hd) tl)
    in
    loop 0 [tree]

(* Get the subtree rooted at the specified index.
 * Index starts at 0 at the root of the tree and is ordered depth-first. *)
let subtree_index_dfs tree index =
    let prepend a b =
        a@b
    in
    subtree_index tree index prepend

(* Get the subtree rooted at the specified index.
 * Index starts at 0 at the root of the tree and is ordered breadth-first. *)
let subtree_index_bfs tree index =
    let append a b =
        b@a
    in
    subtree_index tree index append

(* Misc. *)
let rec string_of_tree t =
    match t with
    | Terminal i -> string_of_int i
    | Function (sym, children) ->
        let children_str = List.map (fun child -> string_of_tree child) children
        in
        "(" ^ sym ^ " " ^ String.concat " " children_str ^ ")"

let print_tree t =
    print_endline (string_of_tree t)

Example usage:

let () =
    let t1 = Function ("+", [Function ("*", [Terminal 5; Terminal 6]);
                             Function ("sqrt", [Terminal 3])])
    in
    print_tree (subtree_index_dfs t1 0);  (* (+ ( * 5 6) (sqrt 3)) *)
    print_tree (subtree_index_dfs t1 1);  (* ( * 5 6) *)
    print_tree (subtree_index_dfs t1 2);  (* 5 *)
    print_tree (subtree_index_dfs t1 3);  (* 6 *)
    print_tree (subtree_index_dfs t1 4);  (* (sqrt 3) *)
    print_tree (subtree_index_dfs t1 5);  (* 3 *)
    print_tree (subtree_index_dfs t1 6);  (* Exception: Failure "Index out of bounds". *)

This is not really different. DFS and BFS can be reformulated as traversal algorithms that transform the tree rather than search anything. — n. m. could be an AI, Dec 25 '20 at 00:12
For homework questions you should include some code of your own to demonstrate that you've actually made an attempt yourself and to give an indication of what exactly you're struggling with. Otherwise it's hard to give an answer that doesn't just provide a solution. As Mr. pronouns points out it should be pretty straightforward to adapt the DFS and BFS algorithms to do transformation instead of search. Getting the indexes right is trickier, but having a working transform function is a necessary first step, so I suggest you start with that. — glennsl, Dec 25 '20 at 01:50
@glennsl This is not a homework question; I formulated the question and drew the diagram on my own. The problem is that I am completely stuck. I don't even know how to sketch the skeleton of the function. I have solved some related problems before. For example, I solved [How do I get a subtree by index?](https://stackoverflow.com/q/65000785) using continuation-passing style. — Flux, Dec 25 '20 at 08:05
Even if not homework it would still be very helpful to have some piece of code to base an answer on, to narrow the question down a bit and perhaps give some indication as to what has gotten you stuck. Even if just a working DFS and/or BFS function that can then be adapted. I could of course just write up a solution, but I don't think that's a very satisfying answer, nor is it particularly satisfying to write such an answer. — glennsl, Dec 25 '20 at 14:35
CPS seems the right choice. I once posted a Scheme (and pseudocode) [answer](https://stackoverflow.com/a/50588667/849891) to a [very similar problem](https://stackoverflow.com/questions/50554603/fill-a-nested-structure-with-values-from-a-linear-supply-stream) which has two more answers, one in Scala and another in JS. — Will Ness, Dec 25 '20 at 15:46
@glennsl Okay, thanks for the advice. I have added an implementation of breadth-first traversal and depth-first traversal. — Flux, Dec 25 '20 at 16:22
*"how to pass the counter around when backtracking"* exactly what my Scheme answer is showing, except, not *backtracking* but *continuing*, as we are not redoing anything, but rather are continuing with the rest of the structure having processed a part of it. this approach uses so-called "continuation functions" i.e. not the full-blown Scheme continuations, but just "to-do-next" functional parameters as used in CPS -- which is what it is, continuation-passing style (again, we're not passing full continuations in CPS, just some "do-next" regular functions). and OCaml is very close to Scheme. — Will Ness, Dec 25 '20 at 16:41
here's one example: [Continuation Passing Style in ocaml](https://stackoverflow.com/q/46946065/849891). — Will Ness, Dec 25 '20 at 16:42
@n.'pronouns'm. Do you mind sharing the method for reformulating DFS and BFS to transform the tree? I have added my implementation of depth-first and breadth-first traversal. I am unable to see how I could tweak it to transform the tree instead. — Flux, Dec 25 '20 at 16:56
I'm not too familiar with ocaml, I can show a Haskell implementation. DFS is not too complicated, BFS is trickier, I need to think about it... — n. m. could be an AI, Dec 25 '20 at 19:24
Just as a side note, you could remove the definition of prepend and write ```subtree_index tree index ( @ )```. You could also do the (almost) same trick with append using ```Fun.flip``` ;) — ghilesZ, Dec 29 '20 at 10:47

score 2 · Answer 1 · answered Dec 26 '20 at 05:39

This isn't a complete answer, and I think asking for both depth-first and breadth-first is a bit too broad as I don't see there being much to generallize, but I hope it will at least get you a bit further, and perhaps spark some ideas.

I think the problem you're having stems from your current code being overgeneralized. You're basically transforming the tree, rather inefficiently, into a list and then indexing into that list. You can't then transform the list back into a tree, because you've thrown that information away.

Depth-first traversal is really very simple, and lends itself naturally to a recursive implementation. If we ignore the index for now, the tree -> indexed_tree function is just this:

let indexed tree =
  let rec loop = function
    | Terminal value -> ITerminal (0, value)
    | Function (name, children) -> IFunction (0, name, loop_children children)
  and loop_children = function
    | [] -> []
    | child :: rest -> loop child :: loop_children rest
  in loop tree

Then we just have to fill in the index. This does so by passing an incrementing index up while recursing, and returning the node count along with the constructed subtree on the way down so that we know how much to increment the index by:

let indexed_dfs tree =
  let rec loop i = function
    | Function (name, children) ->
      let (child_count, children') = loop_children (i + 1) children in
      (child_count + 1, IFunction (i, name, children'))
    | Terminal value -> (1, ITerminal (i, value))
  and loop_children i = function
    | [] -> (0, [])
    | child :: rest ->
      let (count, child') = loop i child in
      let (rest_count, rest') = loop_children (i + count) rest in
      (count + rest_count, child' :: rest')
in snd (loop 0 tree)

Instead of the count each call could also just have returned the last index, but I thought it was simpler to follow two separate concepts instead of overloading a single one.

Breadth-first transformation is unfortunately significantly trickier, as we can't easily construct a new tree breadth-first. But we also shouldn't need to. I think instead of keeping track of the total count it might be an idea to keep track of the accumulated counts, or offsets, of each level by passing around a stack of offsets for the levels seen thus far, and then use that to calculate the index of the node currently being constructed.

it is relatively easy to construct a tree breadth-first [*lazily*](https://stackoverflow.com/a/60561480/849891). a corresponding way in a strict language is to do so by building it top-down, filling in the nodes' child(ren) field(s) in recursive call(s) (yes, destructively, by mutation), a la tail recursion modulo cons. it is easiest in C, with array-backed trees, where it is a no-op. (all this refers to binary trees though). — Will Ness, Dec 29 '20 at 18:32

How to add index numbers to the nodes of a tree

1 Answers1