3

I am trying to convert a multi-level hierarchy table into a specific JSON format for a visual I am creating.

I have the data in a pandas dataframe and have tried grouping it by the different levels, but then cannot convert a groupby to a json using pandas. I did also try just converting the dataframe to a json, but the format isn't correct. I am not sure what else to do to get the parent/child format that I am looking for. All the "size" values only need to be 1 so that part seems straightforward enough... Thanks in advance!

**This is what my data looks like**
ColA     ColB     ColC   
Parent1  Child1   
Parent1  Child2   Child2A 
Parent1  Child2   Child2B
Parent1  Child3   Child2A
Parent2  Child1
Parent2  Child2   Child2A

What I am getting from the pandas dataframe to_json is creating the json column by column, so I am losing the hierarchy aspect of it.

so its:

data = {"Parent1}"{"index #":"col2 value"

What I want is:

data = ({ "name":"TEST",
"children": [
  {
    "name": "Parent1",
    "children": 
      [
      {
        "name": "Child1",
        "size": "1"
      },
      {
      "name":"Child2",
        "children": 
        [
        {
          "name":"Child2A",
          "size":"1" 
        },
        {
          "name":"Child2B",
          "size":"1" 
        },
        {
          "name":"Child2C",
          "size":"1" 
        },
        {
          "name":"Child2D",
          "size":"1" 
        },
        ],
      },
    {
      "name":"Parent2",
      "children": [
        {
          "name":"Child2A",
          "size":"1" 
        },
        {
          "name":"Child2B",
          "size":"1" 
        },
        {
          "name":"Child2C",
          "size":"1" 
        },
      ]
    },
    ]
  },
  {
    "name": "Parent3",
    "children": 
    [
      {
        "name": "Child1",
        "size": "1",
      },
      {
      "name":"Child2",
      "children": 
      [
        {
          "name":"Child2A",
          "size":"1" 
        },
        {
          "name":"Child2B",
          "size":"1" 
        },
        {
          "name":"Child2C",
          "size":"1" 
        },
      ],
    },
    {
      "name":"Child3",
      "children": 
      [
        {
          "name":"Child3A",
          "size":"1" 
        },
      ],
    },
    ],
  },
]})
Katy
  • 33
  • 4
  • There is no straight forward library/solution, first you need to convert your [dataframe to JSON](https://stackoverflow.com/questions/39257147/convert-pandas-dataframe-to-json-format) then you need to customize the JSON in your way. – Soumendra Dec 20 '18 at 15:31
  • Thanks for your response, could you offer some guidance as to how i would go about customizing it to get it into that format? – Katy Dec 20 '18 at 15:45

1 Answers1

1

Here we come

import json

data = [
    'Parent1  Child1',
    'Parent1  Child2   Child2A',
    'Parent1  Child2   Child2B',
    'Parent1  Child3   Child2A',
    'Parent2  Child1',
    'Parent2  Child2   Child2A',
]

tree = {}

for d in data:
    node = None
    for item in d.split():
        name = item.strip()  # dont need spaces
        current_dict = tree if node is None else node
        node = current_dict.get(name)
        if not node:
            node = {}
            current_dict[name] = node


def walker(src, res):
    for name, value in src.items():
        node = {'name': name, 'size': 1}
        if 'children' not in res:
            res['children'] = []
        res['children'].append(node)
        walker(value, node)

result = {'name': 'TEST'}
walker(tree, result)

print (json.dumps(result, indent = True))
grapes
  • 8,185
  • 1
  • 19
  • 31
  • Grapes, thanks for this, one question. I ran a test and for elements with multiple words the code splits each word into its own "name", is there a way to have this work if the data was "Parent A", or "Jane Smith"? or does it have to be "JaneSmith"? – Katy Dec 20 '18 at 16:48
  • You can use underscope instead of spaces. Otherwise how do you suggest to split the line like `john doe smith`? You never know where one person ends and another starts – grapes Dec 20 '18 at 16:56
  • Best idea is to use commas in your file instead of spaces. Then you get CSV and can easily use long names. But in this case you should update the code to `split(',')` and not forget to call `strip()` to get rid of spaces on both sides – grapes Dec 20 '18 at 16:59
  • thanks all, this mostly worked, the only issue that I am now working out is that this does append a "size" to every single name, and in cases where there are "children" elements there shouldn't be a size. – Katy Dec 20 '18 at 20:37
  • This did work, one issue that did come up that needed some additional cleanup is that ever child element was given a size, and if a child element also has children then it doesn't need a size. Otherwise grapes idea worked like a charm! – Katy Dec 27 '18 at 19:07