11

I have an array of arrays that looks like this:

var arrays = [[1,2,3,4,5],
              [1,2,6,4,5],
              [1,3,6,4,5],
              [1,2,3,6,5],
              [1,7,5],
              [1,7,3,5]]

I want to use d3.nest() or even just standard javascript to convert this data into a nested data structure that I can use with d3.partition.

Specifically, I want to create this flare.json data format.

The levels of the json object I want to create with d3.nest() correspond to the index positions in the array. Notice that 1 is in the first position in all the subarrays in the example data above; therefore, it is at root of the tree. At the next positions in the arrays there are three values, 2, 3, and 7, therefore, the root value 1 has 3 children. At this point the tree looks like this:

      1
    / | \
   2  3  7

At the third position in the subarrays there are four values, 3, 5, and 6. These children would be places into the tree as follows:

            1
        ____|___
       /    |    \
      2     3     7
     / \   /     / \
    3   6 6     3   5

How can I produce this data structure using d3.nest()? The full data structure with the example data I showed above should look like this:

   {"label": 1, 
     "children": [
        {"label": 2, "children": [
            {"label": 3, "children": [
                {"label": 4, "children": [
                    {"label": 5}
                ]},
                {"label": 6, "children": [
                    {"label": 5}
                ]}
            ]},
            {"label": 6, "children": [
                {"label": 4, "children": [
                    {"label": 5}
                ]}
            ]},
        {"label": 3, "children": [
            {"label": 6, "children": [
                {"label": 4, "children": [
                    {"label": 5}
                ]}
            ]}
        ]},
        {"label": 7, "children": [
            {"label": 3, "children": [
                {"label": 5}
            ]},
            {"label": 5}
        ]}
      ]}
    ]}

I'm trying to convert my array data structure above using something like this (very wrong):

var data = d3.nest()
  .key(function(d, i) { return d.i; })
  .rollup(function(d) { return d.length; })

I've been banging my head for a week to try and understand how I can produce this hierarchical data structure from an array of arrays. I'd be very grateful if someone could help me out.

@meetamit's answer in the comments is good, but in my case my tree is too deep to repeatedly apply .keys() to the data, so I cannot manually write a function like this.

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
turtle
  • 7,533
  • 18
  • 68
  • 97
  • 1
    First, since you want multiple nested levels, you need to call `d3.nest().key(...)` multiple times. The function you pass into the first time you call `key(function() {})` will produce the first level; 2nd function will produce 2nd level, etc. Check out the console output in [this fiddle](http://jsfiddle.net/J6mPk/). It should help you figure out how to get it to the form you want it to be in. – meetamit Jan 03 '14 at 22:45
  • That's profoundly helpful. Thank you. – turtle Jan 03 '14 at 22:50
  • It sounds like you want a recursive function and not `d3.nest`. – Lars Kotthoff Jan 06 '14 at 21:11
  • 1
    I've been looking [at this function](http://stackoverflow.com/a/11945498/1255817), which is similar to what I am looking for; however, I can't seem to adapt it correctly to product the correct tree structure. – turtle Jan 06 '14 at 21:16

4 Answers4

12

Here's a more straightforward function that just uses nested for-loops to cycle through all the path instructions in each of your set of arrays.

To make it easier to find the child element with a given label, I have implemented children as a data object/associative array instead of a numbered array. If you want to be really robust, you could use a d3.map for the reasons described at that link, but if your labels are actually integers than that's not going to be a problem. Either way, it just means that when you need to access the children as an array (e.g., for the d3 layout functions), you have to specify a function to make an array out of the values of the object -- the d3.values(object) utility function does it for you.

The key code:

var root={}, 
    path, node, next, i,j, N, M;

for (i = 0, N=arrays.length; i<N; i++){
    //for each path in the data array 
    path = arrays[i];
    node = root; //start the path from the root

    for (j=0,M=path.length; j<M; j++){
        //follow the path through the tree
        //creating new nodes as necessary

        if (!node.children){ 
            //undefined, so create it:
            node.children = {}; 
        //children is defined as an object 
        //(not array) to allow named keys
        }

        next = node.children[path[j]];
        //find the child node whose key matches
        //the label of this step in the path

        if (!next) {
            //undefined, so create
            next = node.children[path[j]] = 
                {label:path[j]};
        }
        node = next; 
        // step down the tree before analyzing the
        // next step in the path.        
    }    
}

Implemented with your sample data array and a basic cluster dendogram charting method:
http://fiddle.jshell.net/KWc73/

Edited to add: As mentioned in the comments, to get the output looking exactly as requested:

  1. Access the data's root object from the default root object's children array.
  2. Use a recursive function to cycle through the tree, replacing the children objects with children arrays.

Like this:

root = d3.values(root.children)[0];
//this is the root from the original data, 
//assuming all paths start from one root, like in the example data

//recurse through the tree, turning the child
//objects into arrays
function childrenToArray(n){
    if (n.children) {
        //this node has children

        n.children = d3.values(n.children);
        //convert to array

        n.children.forEach(childrenToArray);
        //recurse down tree
    }
}
childrenToArray(root);

Updated fiddle:
http://fiddle.jshell.net/KWc73/1/

AmeliaBR
  • 27,344
  • 6
  • 86
  • 119
  • P.S. The code creates an unlabeled root node, both because it made the code simpler and to allow for the fact that all the paths in the data might not start with the same label. If you don't want to include that extra root in your graph, just use `root = d3.values(root.children)[0]` to skip to the first (and in this case only) child of the default root. – AmeliaBR Jan 07 '14 at 05:49
  • +1 for the graph! Looks like our two methods are duals -- you do it path by path and I level by level. – Lars Kotthoff Jan 07 '14 at 06:28
  • Thanks. This is cool, although not quite correct. Your solution does not have a "label" at the top level and the children should be a list, you have them as a dictionary. – turtle Jan 07 '14 at 13:01
  • @turtle: I wouldn't normally bother fussing with those details beyond telling you how to fix them. But since you've got a big bounty on the question, I figured I had better answer it explicitly as asked. Edited. – AmeliaBR Jan 07 '14 at 16:25
  • @LarsKotthoff: Exactly. Different approaches to the same problem. My instinct is that this approach would be slightly faster than yours for large data sets, since I'm not making temp arrays or using filters, but the extra recursion step at the end might cancel out some of the advantage. Another advantage I'd claim is that the code is easier to read/understand -- but then again, I'm biased, since I wrote the code! – AmeliaBR Jan 07 '14 at 16:34
  • @AmeliaBR congrats on getting a max bounty! – VividD Jan 15 '14 at 00:39
1

If you extend the specification of Array, it's not actually that complex. The basic idea is to build up the tree level by level, taking each array element at a time and comparing to the previous one. This is the code (minus extensions):

function process(prevs, i) {
  var vals = arrays.filter(function(d) { return prevs === null || d.slice(0, i).compare(prevs); })
                 .map(function(d) { return d[i]; }).getUnique();
  return vals.map(function(d) {
    var ret = { label: d }
    if(i < arrays.map(function(d) { return d.length; }).max() - 1) {
        tmp = process(prevs === null ? [d] : prevs.concat([d]), i+1);
        if(tmp.filter(function(d) { return d.label != undefined; }).length > 0)
          ret.children = tmp;
    }
    return ret;
  });
}

No guarantees that it won't break for edge cases, but it seems to work fine with your data.

Complete jsfiddle here.

Some more detailed explanations:

  • First, we get the arrays that are relevant for the current path. This is done by filtering out those that are not the same as prevs, which is our current (partial) path. At the start, prevs is null and nothing is filtered.
  • For these arrays, we get the values that corresponds to the current level in the tree (the ith element). Duplicates are filtered. This is done by the .map() and .getUnique().
  • For each of the values we got this way, there will be a return value. So we iterate over them (vals.map()). For each, we set the label attribute. The rest of the code determines whether there are children and gets them through a recursive call. To do this, we first check whether there are elements left in the arrays, i.e. if we are at the deepest level of the tree. If so, we make the recursive call, passing in the new prev that includes the element we are currently processing and the next level (i+1). Finally, we check the result of this recursive call for empty elements -- if there are only empty children, we don't save them. This is necessary because not all of the arrays (i.e. not all of the paths) have the same length.
Lars Kotthoff
  • 107,425
  • 16
  • 204
  • 204
  • Wow, this is great! I'm not that good at JS, so it will take me a bit of time to understand your code. One thing I can see is that the terminal nodes appear slightly incorrect. They have empty arrays and objects. For example, `[1,7,3,5]` has `{name: 5, children: []}` and `[1,7,5]` has `{name: 5, children: [[{children:[]}]}` – turtle Jan 06 '14 at 22:18
  • I noticed that and fixed it in the latest edit I've made (at least I think so -- let me know if I'm wrong). – Lars Kotthoff Jan 06 '14 at 22:24
  • I can't say I like the idea of messing with the array prototype for an initialization routine, but the concept is solid. This is definitely not a `d3.nest` situation. – AmeliaBR Jan 07 '14 at 01:44
  • You're not really messing with the `Array` prototype -- extending the standard functionality is a pretty common thing and not hacky at all. – Lars Kotthoff Jan 07 '14 at 06:21
  • @LarsKotthoff Maybe I'm still not comfortable with all the possibilities of JavaScript; if I was to implement your algorithm, I would create a dedicated class that inherits from Array. But I suppose it would only be a problem if another script on the page is also extending `Array` with the same function names but different functionality. Which is possible, but improbable. – AmeliaBR Jan 07 '14 at 16:38
  • That wouldn't actually be a problem as long as the two scripts are separate -- you can overwrite functions just like anything else. So I can use my version of the function and you can use your version of the function without problems. Ok, not quite -- once you have asynchronous callbacks things may start getting hairy ;) – Lars Kotthoff Jan 07 '14 at 18:17
0

Since d3-collection has been deprecated in favor of d3.array, we can use d3.groups to achieve what used to work with d3.nest:

var input = [
  [1, 2, 3, 4, 5],
  [1, 2, 6, 4, 5],
  [1, 3, 6, 4, 5],
  [1, 2, 3, 6, 5],
  [1, 7, 5],
  [1, 7, 3, 5]
];

function process(arrays, depth) {
  return d3.groups(arrays, d => d[depth]).map(x => {
    if (
      x[1].length > 1 ||                     // if there is more than 1 child
      (x[1].length == 1 && x[1][0][depth+1]) // if there is 1 child and the future depth is inferior to the child's length
    )
      return ({
        "label": x[0],
        "children": process(x[1], depth+1)
      });
    return ({ "label": x[0] });              // if there is no child
  });
};

console.log(process(input, 0));
<script src="https://d3js.org/d3-array.v2.min.js"></script>

This:

  • Works as a recursion on the arrays' depths.
  • Each recursion step groups (d3.groups) its arrays on the array element whose index is equal to the depth.
  • Depending on whether there are children or not, the recursion stops.

Here is the intermediate result produced by d3.groups within a recursion step (grouping arrays on there 3rd element):

var input = [
  [1, 2, 3, 4, 5],
  [1, 2, 6, 4, 5],
  [1, 2, 3, 6, 5]
];

console.log(d3.groups(input, d => d[2]));
<script src="https://d3js.org/d3-array.v2.min.js"></script>
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
0

Edit - fixed

Here is my solution Pro:It is all in one go (doesn't need objects converting to arrays like above) Pro:It keeps the size/value count Pro:the output is EXACTLY the same as a d3 flare with children Con:it is uglier, and likely less efficient Big Thanks to previous comments for helping me work it out.

    var data = [[1,2,3,4,5],
        [1,2,6,4,5],
        [1,3,6,4,5],
        [1,2,3,6,5],
        [1,7,5],
        [1,7,3,5]]

    var root = {"name":"flare", "children":[]} // the output
    var node // pointer thingy
    var row



// loop through array
for(var i=0;i<data.length;i++){

row = data[i];
node = root;

    // loop through each field
    for(var j=0;j<row.length;j++){

        // set undefined to "null"
        if (typeof row[j] !== 'undefined' && row[j] !== null) {
            attribute = row[j]
        }else{
            attribute = "null"
        }

        // using underscore.js, does this field exist
        if(_.where(node.children, {name:attribute}) == false  ){


            if(j < row.length -1){
                // this is not the deepest field, so create a child with children
                var oobj = {"name":attribute, "children":[] }
                node.children.push(oobj)
                node = node.children[node.children.length-1]
            }else{
                // this is the deepest we go, so set a starting size/value of 1
                node.children.push({"name":attribute, "size":1 })
            }

        }else{

            // the fields exists, but we need to find where
            found = false
            pos = 0

            for(var k=0;k< node.children.length ;k++){
                if(node.children[k]['name'] == attribute){
                    pos = k
                    found = true
                    break       
                }
            }

            if(!node.children[pos]['children']){      
                // if no key called children then we are at the deepest layer, increment
                node.children[pos]['size'] = parseInt(node.children[pos]['size']) + 1
            }else{
                // we are not at the deepest, so move the pointer "node" and allow code to continue
                node = node.children[pos]
            }
        }
    }
}



// object here
console.log(root)

// stringified version to page
document.getElementById('output').innerHTML = JSON.stringify(root, null, 1);

Working examples https://jsfiddle.net/7qaz062u/

Output

{ "name": "flare", "children": [ { "name": 1, "children": [ { "name": 2, "children": [ { "name": 3, "children": [ { "name": 4, "children": [ { "name": 5, "size": 1 } ] } ] }, { "name": 6, "children": [ { "name": 4, "children": [ { "name": 5, "size": 1 } ] } ] } ] }, { "name": 3, "children": [ { "name": 6, "children": [ { "name": 4, "children": [ { "name": 5, "size": 1 } ] } ] }, { "name": 3, "children": [ { "name": 6, "children": [ { "name": 5, "size": 1 } ] } ] } ] }, { "name": 7, "children": [ { "name": 5, "size": 1 }, { "name": 3, "children": [ { "name": 5, "size": 1 } ] } ] } ] } ] }