0

I have a dataframe that I want to convert to a hierarchical flare json to be used in a D3 visulalization like this: D3 sunburst

My dataframe contains a hierarchial data such as this:

enter image description here

And the output I want should look like this:

{"name": "flare","children": 
    [
        {"name": "Animal", "children": 
            [
                {"name": "Mammal", "children":
                    [
                        {"name": "Fox","value":35000}, 
                        {"name": "Lion","value":25000}
                    ]
                },
                {"name": "Fish", "children":
                    [
                        {"name": "Cod","value":35000} 
                    ]
                }
            ]
        },
        {"name": "Plant", "children": 
            [
                {"name": "Tree", "children":
                    [
                        {"name": "Oak","value":35000} 
                    ]
                }
            ]
        }
     ]
} 

I have tried several approaches, but cant get it right. Here is my non-working code, inspired by this post: Pandas to D3. Serializing dataframes to JSON

from collections import defaultdict
import pandas as pd
df = pd.DataFrame({'group1':["Animal", "Animal", "Animal", "Plant"],'group2':["Mammal", "Mammal", "Fish", "Tree"], 'group3':["Fox", "Lion", "Cod", "Oak"],'value':[35000,25000,15000,1500]  })
tree = lambda: defaultdict(tree)  
d = tree()
for _, (group0,group1, group2, group3, value) in df.iterrows():
    d['name'][group0]['children'] = group1
    d['name'][group1]['children'] = group2
    d['name'][group2]['children'] = group3
    d['name'][group3]['children'] = value


json.dumps(d)
Ullsokk
  • 697
  • 2
  • 11
  • 24

1 Answers1

0

I am working on a similar visualization project that requires moving data from a Pandas DataFrame to a JSON file that works with D3.

I came across your post while looking for a solution and ended up writing something based on this GitHub repository and with input from the link you provided in this post.

The code is not pretty and is a bit hacky and slow. But based on my project, it seems to work just fine for any amount of data as long as it has three levels and a value field. You should be able to simply fork the D3 Starburst notebook and replace the flare.json file with this code's output.

The modification that I made here, based on the original GitHub post, is to provide consideration for three levels of data. So, if the name of the level 0 node exists, then append from level 1 and on. Likewise, if the name of the level 1 node exists, then append the level 2 node (the third level). Otherwise, append the full path of data. If you need more, some kind of recursion might do the trick, or just keep hacking it to add more levels

# code snip to format Pandas DataFrame to json for D3 Starburst Chart

# libraries
import json
import pandas as pd

# example data with three levels and a single value field
data = {'group1': ['Animal', 'Animal', 'Animal', 'Plant'],
        'group2': ['Mammal', 'Mammal', 'Fish', 'Tree'],
        'group3': ['Fox', 'Lion', 'Cod', 'Oak'],
        'value': [35000, 25000, 15000, 1500]}

df = pd.DataFrame.from_dict(data)

print(df)

""" The sample dataframe
group1  group2 group3  value
0  Animal  Mammal    Fox  35000
1  Animal  Mammal   Lion  25000
2  Animal    Fish    Cod  15000
3   Plant    Tree    Oak   1500
"""

# initialize a flare dictionary
flare = {"name": "flare", "children": []}

# iterate through dataframe values
for row in df.values:
    level0 = row[0]
    level1 = row[1]
    level2 = row[2]
    value = row[3]
    
    # create a dictionary with all the row data
    d = {'name': level0,
          'children': [{'name': level1,
                        'children': [{'name': level2,
                                      'value': value}]}]}
    # initialize key lists
    key0 = []
    key1 = []

    # iterate through first level node names
    for i in flare['children']:
        key0.append(i['name'])

        # iterate through next level node names
        key1 = []
        for _, v in i.items():
            if isinstance(v, list):
                for x in v:
                    key1.append(x['name'])

    # add the full row of data if the root is not in key0
    if level0 not in key0:
        d = {'name': level0,
              'children': [{'name': level1,
                            'children': [{'name': level2,
                                          'value': value}]}]}
        flare['children'].append(d)

    elif level1 not in key1:

        # if the root exists, then append only the next level children

        d = {'name': level1,
              'children': [{'name': level2,
                            'value': value}]}

        flare['children'][key0.index(level0)]['children'].append(d)

    else:

        # if the root exists, then only append the next level children
        
        d = {'name': level2,
             'value': value}

        flare['children'][key0.index(level0)]['children'][key1.index(level1)]['children'].append(d)

# uncomment next three lines to save as json file
# save to some file
# with open('filename_here.json', 'w') as outfile:
#     json.dump(flare, outfile)

print(json.dumps(flare, indent=2))

""" the expected output of this json data
{
  "name": "flare",
  "children": [
    {
      "name": "Animal",
      "children": [
        {
          "name": "Mammal",
          "children": [
            {
              "name": "Fox",
              "value": 35000
            },
            {
              "name": "Lion",
              "value1": 25000
            }
          ]
        },
        {
          "name": "Fish",
          "children": [
            {
              "name": "Cod",
              "value": 15000
            }
          ]
        }
      ]
    },
    {
      "name": "Plant",
      "children": [
        {
          "name": "Tree",
          "children": [
            {
              "name": "Oak",
              "value": 1500
            }
          ]
        }
      ]
    }
  ]
}
"""
Justin Chae
  • 1
  • 1
  • 1