0

I have a DataFrame that contains pseudo nested columns (i.e., . as a name) made by using Panda's function 'json_normalize' in a nested JSON. I would like to build a JSON out of the DataFrame that has nested values.

Here's an example of what I'm trying to do. Note that this is just an example to make it more understandable, it's not exactly my use case. My use case requires Pandas to do some "heavy" Dataframe transformations.

DataFrame like:

   id name.first name.last
0   1     Coleen      Volk
1   2       Mose    Regner

The JSON I want is:

[{'id': 1,
 'name': {'first': 'Coleen',
         'last': 'Volk'}},
{'id': 2, 
'name': {'first': 'Mose', 
         'last': 'Regner'}}]

-- Simple code to replicate:

data = [{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk', 'nickname': 'Perico'}},
    {'id': 2, 'name': {'first': 'Mose', 'last': 'Regner', 'nickname': 'Palotes'}}]

df = pd.io.json.json_normalize(data)
#some transormations using Pandas
df = df.drop(columns=['name.nickname'])

#Now I want to build the JSON
Iñigo
  • 2,500
  • 2
  • 10
  • 20
  • why even put it in the dataframe to begin with? you can just drop the key you don't want from the JSON since it is already in the desired format – gold_cy Mar 29 '19 at 11:46
  • I need Pandas to do some transformation. The example above is just to make it as simple as possible, it's not my actual use case – Iñigo Mar 29 '19 at 11:58
  • can't you remove nickname before putting it in the dataframe then – gold_cy Mar 29 '19 at 12:07
  • I could, but as I said, this is an mock example of what I'm trying to do. My actual data is not like that, it's not names. I need Pandas to normalize values, convert to different types, ... Dropping a column is just an example of a transformation – Iñigo Mar 29 '19 at 12:09

1 Answers1

2

As per the function provided here by @Parsa T

import pandas as pd

data = [{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk', 'nickname': 'Perico'}},
    {'id': 2, 'name': {'first': 'Mose', 'last': 'Regner', 'nickname': 'Palotes'}}]

df = pd.io.json.json_normalize(data)
#some transormations using Pandas
df = df.drop(columns=['name.nickname'])





def set_for_keys(my_dict, key_arr, val):
    """
    Set val at path in my_dict defined by the string (or serializable object) array key_arr
    """
    current = my_dict
    for i in range(len(key_arr)):
        key = key_arr[i]
        if key not in current:
            if i==len(key_arr)-1:
                current[key] = val
            else:
                current[key] = {}
        else:
            if type(current[key]) is not dict:
                print("Given dictionary is not compatible with key structure requested")
                raise ValueError("Dictionary key already occupied")

        current = current[key]

    return my_dict

def to_formatted_json(df, sep="."):
    result = []
    for _, row in df.iterrows():
        parsed_row = {}
        for idx, val in row.iteritems():
            keys = idx.split(sep)
            parsed_row = set_for_keys(parsed_row, keys, val)

        result.append(parsed_row)
    return result


#Where df was parsed from json-dict using json_normalize
to_formatted_json(df, sep=".")

Output:

Out[9]: 
[{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}},
 {'id': 2, 'name': {'first': 'Mose', 'last': 'Regner'}}]
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • Thanks! I was expecting something implemented in Pandas, but sometimes it's not possible to get it. I looked for a while to check if there was a previous question but couldn't find anything. Using inverse as a keyword wasn't easy to think about. – Iñigo Mar 29 '19 at 15:25