16

I am working with d3.js to visualise families of animals (organisms) (up to 4000 at a time) as a tree graph, though the data source could just as well be a directory listing, or list of namespaced objects. my data looks like:

json = {
    organisms:[
        {name: 'Hemiptera.Miridae.Kanakamiris'},
        {name: 'Hemiptera.Miridae.Neophloeobia.incisa'},
        {name: 'Lepidoptera.Nymphalidae.Ephinephile.rawnsleyi'},
        ... etc ...
    ]
}

my question is: I am trying to find the best way to convert the above data to the hierarchical parent / children data structure as is used by a number of the d3 visualisations such as treemap (for data example see flare.json in the d3/examples/data/ directory). Here is an example of the desired data structure:

{"name": "ROOT",
 "children": [
        {"name": "Hemiptera",
         "children": [
             {"name": "Miridae",
              "children": [
                  {"name": "Kanakamiris", "children":[]},
                  {"name": "Neophloeobia",
                   "children": [
                       {"name": "incisa", "children":[] }
                   ]}
              ]}
         ]},
        {"name": "Lepidoptera",
         "children": [
             {"name": "Nymphalidae",
              "children": [
                  {"name": "Ephinephile",
                   "children": [
                       {"name": "rawnsleyi", "children":[] }
                   ]}
              ]}
         ]}
    ]}
}

EDIT: enclosed all the original desired data structure inside a ROOT node, so as to conform with the structure of the d3 examples, which have only one master parent node.

I am looking to understand a general design pattern, and as a bonus I would love to see some solutions in either javascript, php, (or even python). javascript is my preference. In regards to php: the data I am actually using comes from a call to a database by a php script that encodes the results as json. database results in the php script is an ordered array (see below) if that is any use for php based answers.

Array
(
    [0] => Array
        (
            ['Rank_Order'] => 'Hemiptera'
            ['Rank_Family'] => 'Miridae'
            ['Rank_Genus'] => 'Kanakamiris'
            ['Rank_Species'] => ''
        ) ........

where: 'Rank_Order' isParentOf 'Rank_Family' isParentOf 'Rank_Genus' isParentOf 'Rank_Species'

I asked a similar question focussed on a php solution here, but the only answer is not working on my server, and I dont quite understand what is going on, so I want to ask this question from a design pattern perspective, and to include reference to my actual use which is in javascript and d3.js.

Community
  • 1
  • 1
johowie
  • 2,475
  • 6
  • 26
  • 42
  • You could use a more compressed syntax like http://i.stack.imgur.com/V5zg0.jpg – Oriol Aug 26 '12 at 01:48
  • From your description it sounds like this would be better done server-side at the time the data is extracted from the DB. Your desired structure should have square brackets `[]` instead of the outermost curly brackets `{}` because you're using it as an array not an object. (Note also, you're not talking about JSON here, you're talking about JS objects. You're serialising the data as JSON to get it to the client, but the manipulation you want to do is after the JSON is parsed into a JS object.) – nnnnnn Aug 26 '12 at 01:48
  • @Oriol and @nnnnnn the desired datastructure / syntax i am trying to match is the structure used in the d3 examples so i dont have to modify that library. that library uses the keys `name` and `children`. @nnnnnn you are right that i should have used square brackets on the outermost, but this has made me realise i need to add a level above which i will call `root` instead than have the existing structure as children of it, so as to conform to the d3 treemap library. i will edit the example 'desired structure' – johowie Aug 26 '12 at 02:01
  • @nnnnnn i have made the edit, so that the structure is contained as children of a `ROOT` node, and hence i am still using the curly brackets on the outermost. once again this is because i want to use the d3 library as is – johowie Aug 26 '12 at 02:12
  • @Oriol can you describe a way to convert the initial 'flat' data to your 'nested' array structure in either javascript or php ? making things more compressed is good advice though ! – johowie Aug 26 '12 at 02:16
  • [this stack overflow question](http://stackoverflow.com/questions/11088303/how-to-convert-to-d3s-json-format?rq=1) is giving me a little insight, but not enough – johowie Aug 26 '12 at 02:25

3 Answers3

7

The following is specific to the structure you've provided, it could be made more generic fairly easily. I'm sure the addChild function can be simplified. Hopefully the comments are helpful.

function toHeirarchy(obj) {

  // Get the organisms array
  var orgName, orgNames = obj.organisms;

  // Make root object
  var root = {name:'ROOT', children:[]};

  // For each organism, get the name parts
  for (var i=0, iLen=orgNames.length; i<iLen; i++) {
    orgName = orgNames[i].name.split('.');

    // Start from root.children
    children = root.children;

    // For each part of name, get child if already have it
    // or add new object and child if not
    for (var j=0, jLen=orgName.length; j<jLen; j++) {
      children = addChild(children, orgName[j]);      
    }
  }
  return root;

  // Helper function, iterates over children looking for 
  // name. If found, returns its child array, otherwise adds a new
  // child object and child array and returns it.
  function addChild(children, name) {

    // Look for name in children
    for (var i=0, iLen=children.length; i<iLen; i++) {

      // If find name, return its child array
      if (children[i].name == name) {
        return children[i].children;        
      }
    }
    // If didn't find name, add a new object and 
    // return its child array
    children.push({'name': name, 'children':[]});
    return children[children.length - 1].children;
  }
}
RobG
  • 142,382
  • 31
  • 172
  • 209
  • comparing your solution to that by @nnnnnn (only with 400 organisms and 4 levels of classification) it does seem to be faster (approx 2x). a difference between this solution and that by nnnnnn is the way that zero length strings are handled. this solution will make child nodes for those zero length names, both along the branch and for 'leaf' nodes, whereas, nnnnnn's does not create children for zero length names, and also will truncate the 'branch' if there is a zero length name along the 'path/branch'. Obviously I never specified the desired behaviour for this case ! +1 for working answer – johowie Aug 27 '12 at 06:20
  • Behaviour for zero–length strings is an artefact (I suspect nnnnnn's is too), I would have thought they were invalid. If they should have a specific behaviour, you should indicate it and how they might occur in the original data. – RobG Aug 27 '12 at 11:33
  • Yes I agree and so zero-length strings will be invalidated or handled before they get this far.For anyone interested in my data-source I even have fields like 'Rank_super-family' which is optional, and some entries with no 'Rank_Species' as was shown in the example data. All these zero-length strings will be 'vanished' – johowie Aug 27 '12 at 13:46
  • btw it is the RobG solution that seems to be faster than the nnnnnn solution (only on initial tests) – johowie Aug 27 '12 at 13:51
5

Given your starting input I believe something like the following code will produce your desired output. I don't imagine this is the prettiest way to do it, but it's what came to mind at the time.

It seemed easiest to pre-process the data to first split up the initial array of strings into an array of arrays like this:

[
   ["Hemiptera","Miridae","Kanakamiris" ],
   ["Hemiptera","Miridae","Neophloeobia","incisa" ],
   //etc
]

...and then process that to get a working object in a form something like this:

  working = {
       Hemiptera : {
           Miridae : {
              Kanakamiris : {},
              Neophloeobia : {
                  incisa : {}
              }
           }
       },
       Lepidoptera : {
           Nymphalidae : {
              Ephinephile : {
                  rawnsleyi : {}
              }
           }
       }
    }

...because working with objects rather than arrays makes it easier to test whether child items already exist. Having created the above structure I then process it one last time to get your final desired output. So:

// start by remapping the data to an array of arrays
var organisms = data.organisms.map(function(v) {
        return v.name.split(".");
    });

// this function recursively processes the above array of arrays
// to create an object whose properties are also objects
function addToHeirarchy(val, level, heirarchy) {
    if (val[level]) {
        if (!heirarchy.hasOwnProperty(val[level]))
            heirarchy[val[level]] = {};
        addToHeirarchy(val, level + 1, heirarchy[val[level]]);
    }
}
var working = {};    
for (var i = 0; i < organisms.length; i++)
    addToHeirarchy(organisms[i], 0, working);

// this function recursively processes the object created above
// to create the desired final structure
function remapHeirarchy(item) {
    var children = [];
    for (var k in item) {
        children.push({
            "name" : k,
            "children" : remapHeirarchy(item[k])
        });
    }
    return children;
}

var heirarchy = {
    "name" : "ROOT",
    "children" : remapHeirarchy(working)
};

Demo: http://jsfiddle.net/a669F/1/

nnnnnn
  • 147,572
  • 30
  • 200
  • 241
  • I like the way you have used `hasOwnProperty`. on initial testing this does indeed transform the original data to the desired format. I think it is the recursive function pattern that I need to study and will learn most from. +1 for this answer – johowie Aug 26 '12 at 03:58
2

An alternative answer to my own question....In the past day I have learn't a great deal more about d3.js and in relation to this question d3.nest() with .key() and .entries() is my friend (all d3 functions). This answer involves changing the initial data, so it may not qualify as a good answer to the specific question i asked. However if someone has a similar question and can change things on the server then this is a pretty simple solution:

return the data from the database in this format:

json = {'Organisms': [
    { 'Rank_Order': 'Hemiptera',
      'Rank_Family': 'Miridae',
      'Rank_Genus': 'Kanakamiris',
      'Rank_Species': '' },
    {}, ...
]}

Then using d3.nest()

organismNest = d3.nest()
    .key(function(d){return d.Rank_Order;})
    .key(function(d){return d.Rank_Family;})
    .key(function(d){return d.Rank_Genus;})
    .key(function(d){return d.Rank_Species;})
    .entries(json.Organism);

this returns:

{
key: "Hemiptera"
  values: [
    {
      key: "Cicadidae"
      values: [
        {
          key: "Pauropsalta "
          values: [
            {
              key: "siccanus"
              values: [
                       Rank_Family: "Cicadidae"
                       Rank_Genus: "Pauropsalta "
                       Rank_Order: "Hemiptera"
                       Rank_Species: "siccanus"
                       AnotherOriginalDataKey: "original data value"

etc etc, nested and lovely

This returns something very much similar to they array that I described as my desired format above in the question, with a few differences. In particular, There is no all enclosing ROOT element and also whereas they keys I originally wanted were "name" and "children" .nest() returns keys as "key" and "values" respectively. These alternatives keys are easy enough to use in d3.js by just defining appropriate data accessor functions (basic d3 concept) ... but that is getting beyond the original scope of the question ... hope that helps someone too

johowie
  • 2,475
  • 6
  • 26
  • 42