9

I read these questions:

However I'm still not able to convert a csv file to a hierarchy for JSON. Al the scripts I found on stackoverflow are specific for a certain problem. Let's say there are three variables which has to be grouped:

condition   target  sub
oxygen      tree    G1
oxygen      tree    G2
water       car     G3
water       tree    GZ
fire        car     GTD
oxygen      bomb    GYYS

This will result in a JSON file like this (as far as I tried):

oxygen
    - tree  
        - G1
        - G2
    - bomb
        -GYYS
water 
    - car
        - G3
    - tree
        -GZ
fire 
    - car   
        - GTD

And these have to be grouped in a nested structure like:

    {
   "name": "oxygen",
   "children": [
    {
     "name": "tree",
     "children": [
      {"name": "G1"},
      {"name": "G2"},
      {"name": "GYYS"}
     ]
    },
    {
     "name": "bomb",
      "children": [
      {"name": "GYYS"}
     ]
    }
    ]
}
etc.

I tried every script on this site however I'm not able to make a generic function which can make a flare.json like that. I can post my code however this is just like the links provided above. So I'm asking for a simple code (or an example which can help me) to convert this to a flare.JSON like structure.

Community
  • 1
  • 1
CodeNoob
  • 1,988
  • 1
  • 11
  • 33
  • Your "three variables" don't appear anywhere in your json example, there is no obvious connection. So please explain in detail how the json structure could be generated from the data in the csv file – Christian König May 03 '17 at 10:56
  • I'm not that familiar with JSON but I tried to give a short example of what I want (see edit) @ChristianKönig – CodeNoob May 03 '17 at 11:12
  • Can you, please, copy and paste your json example into http://jsoneditoronline.org/ this viewer/validator and at least provide us with the valid json exmaple, because your json example is not a valid json. Also please provide the rules (high level logic) how csv should be converted into json, what's becoming children and why? These information will be helpful to give you a valid answer. – Hett May 03 '17 at 11:35
  • Thankyou for the online editor I fixed my JSON now @Hett – CodeNoob May 03 '17 at 12:30
  • @CodeNoob I think you have provided wrong JSON. From the csv I guess that children of the *tree* should be [G1, G2] and [GYYS] should be presented only as the child of the bomb. Am I right? You need to explain the hierarchy logic correctly (because only you know what you need). – Hett May 03 '17 at 13:01
  • I updated my question with a simple tree format and the updated JSON hopefully it's more clear now @Hett – CodeNoob May 03 '17 at 13:56
  • @CodeNoob I've created a small repo (https://github.com/hettmett/csv_to_json) with the answer posted below. You can use it freely or even modify and contribute. If you have concerns regarding the answer, please, let me know. – Hett May 04 '17 at 12:20

2 Answers2

17

Using defaultdict from the collections standard library is making a lot of problems with hierarchical structures easy and solvable. So I've developed a sample solution for your problem. But before running the script, please, make sure you have comma separated csv file (named test.csv) or you can change the csv reader logic down there.

Here's the csv file I've tested the script on.

condition, target, sub, dub
oxygen,tree,G1,T1
oxygen,tree,G2,T1
oxygen,tree,G2,T2
water,car,G3,T1
water,tree,GZ,T1
water,tree,GZ,T2
fire,car,GTD,T3
oxygen,bomb,GYYS,T1

Technically the script should work for any kind of csv file, with various dimensions. But you need to test it by yourself to be sure.

import csv
from collections import defaultdict


def ctree():
    """ One of the python gems. Making possible to have dynamic tree structure.

    """
    return defaultdict(ctree)


def build_leaf(name, leaf):
    """ Recursive function to build desired custom tree structure

    """
    res = {"name": name}

    # add children node if the leaf actually has any children
    if len(leaf.keys()) > 0:
        res["children"] = [build_leaf(k, v) for k, v in leaf.items()]

    return res


def main():
    """ The main thread composed from two parts.

    First it's parsing the csv file and builds a tree hierarchy from it.
    Second it's recursively iterating over the tree and building custom
    json-like structure (via dict).

    And the last part is just printing the result.

    """
    tree = ctree()
    # NOTE: you need to have test.csv file as neighbor to this file
    with open('test.csv') as csvfile:
        reader = csv.reader(csvfile)
        for rid, row in enumerate(reader):

            # skipping first header row. remove this logic if your csv is
            # headerless
            if rid == 0:
                continue

            # usage of python magic to construct dynamic tree structure and
            # basically grouping csv values under their parents
            leaf = tree[row[0]]
            for cid in range(1, len(row)):
                leaf = leaf[row[cid]]

    # building a custom tree structure
    res = []
    for name, leaf in tree.items():
        res.append(build_leaf(name, leaf))

    # printing results into the terminal
    import json
    print(json.dumps(res))


# so let's roll
main()

And here's the json segment from the result:

{
    "name": "oxygen",
    "children": [
      {
        "name": "tree",
        "children": [
          {
            "name": "G2",
            "children": [
              {
                "name": "T2"
              },
              {
                "name": "T1"
              }
            ]
          },
          {
            "name": "G1",
            "children": [
              {
                "name": "T1"
              }
            ]
          }
        ]
      },
      {
        "name": "bomb",
        "children": [
          {
            "name": "GYYS",
            "children": [
              {
                "name": "T1"
              }
            ]
          }
        ]
      }
    ]
  }

Please, let me know if you have any further questions and issues. Happy pythonning ;)

Hett
  • 2,023
  • 3
  • 16
  • 19
  • Thankyou so much!, much easier to understand as all those other answers on stack – CodeNoob May 04 '17 at 15:00
  • Glad this is helpful! – Hett May 04 '17 at 18:33
  • @Hett what if I have URLs in the csv file ? I mean, i have URL source and URL target, and I would like to create a tree based on directories in the URLs and drawl a graph like this https://bl.ocks.org/mbostock/1062288 any idea ? – Dany M Apr 06 '19 at 14:52
0

An alternative solution, using convtools code generating library:

from convtools import conversion as c
from convtools.contrib.tables import Table


table = Table.from_csv(
    "tmp2.csv", header=True, dialect=Table.csv_dialect(delimiter="\t")
)

child = None
for column in reversed(table.columns):
    if child is None:
        # the most inner children
        child = c.iter(c.item(column)).as_type(list)
    else:
        child = c.group_by(c.item(column)).aggregate(
            {
                "name": c.item(column),
                "children": c.ReduceFuncs.Array(c.this()).pipe(child),
            }
        )
# this is where code generation happens
converter = child.gen_converter()

converter(table.into_iter_rows(dict))

Outputs:

[
    {
        "name": "oxygen",
        "children": [
            {"name": "tree", "children": ["G1", "G2"]},
            {"name": "bomb", "children": ["GYYS"]},
        ],
    },
    {
        "name": "water",
        "children": [
            {"name": "car", "children": ["G3"]},
            {"name": "tree", "children": ["GZ"]},
        ],
    },
    {"name": "fire", "children": [{"name": "car", "children": ["GTD"]}]},
]

westandskif
  • 972
  • 6
  • 9