3

Assume I have a DataFrame df like:

source      tables      columns   data_type   length
src1        table1      col1      INT         4
src1        table1      col2      CHAR        2
src1        table2      col1      CHAR        2
src2        table1      col1      INT         4
src2        table1      col2      DATE        3

NOTE: the DataFrame also has another 4 columns which aren't relevant to the problem

Need an output that looks similar to:

{
  "src1": {
    "table1": {
      "col1": {
        "type": "INT"
        "length": 4
      },
      "col2": {
        "type": "CHAR"
        "length": 2
      }
    },
    "table2": {
      "col1": {
        "type": "CHAR"
        "length": 2
      }
    }
  },
  "src2": {
    "table1": {
      "col1": {
        "type": "INT"
        "length": 4
      },
      "col2": {
        "type": "DATE"
        "length": 3
      }
    }
  }
}

The code the I currently have produces the same output as above with the exclusion of the actual data type values (ie. instead of "type": "CHAR", I'm getting "type": "") as I'm not sure how I'd be able to nest the values accordingly. Here is the code:

def make_nested(df): 
        f = lambda: defaultdict(f)   
        data = f()  

        for row in df.to_numpy().tolist():
            t = data
            for r in row[:-6]:
                t = t[r]
            t[row[-6]] = {
                "type": '',
                "length": ''
            }

        return data

My question is how can I properly append the data_type and length column values into each columns JSON object without sacrificing the exact format? Thanks.

cs95
  • 379,657
  • 97
  • 704
  • 746

1 Answers1

2
def make_nested(df): 
    f = lambda: defaultdict(f)   
    data = f()  

    for row in df.to_numpy().tolist():
        t = data
        for r in row[:-3]:
            t = t[r]
        t[row[-3]] = {
            "type": row[-2],
            "length": row[-1]
        }

    return data

The last two column values go inside the third level, so thats what you should do.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thank you so much! Last thing I need to do is add another column value from my DataFrame within the second (table) level. How would I be able to do this? – weovibewvoibweoivwoiv Mar 02 '20 at 16:49
  • 1
    @weovibewvoibweoivwoiv hmm, adding a second column at a particular level breaks all assumptions in this code. It should be easy enough to do, though. In the inner loop you use `for index, r in enumerate(row[:-3]):` Then check the condition index == 1 and initialise the entry for your second column. – cs95 Mar 02 '20 at 16:53
  • Posted another question - https://stackoverflow.com/questions/60494426/add-column-from-pandas-dataframe-into-deeply-nested-json-at-a-specific-object-le – weovibewvoibweoivwoiv Mar 02 '20 at 18:12