0

I know there are quite similar questions so far, but I just don't get a right idea sorry.

I would like to convert a complex json to a clean pandas dataframe.

My Code so far:

with open('JSON_Input.json', 'r') as json_file:
    json_data = json.load(json_file)
    json_data = json.loads(json_data)

This creates the following complex, nested json object:

json_data 

{'time': 0,
 'day1': [{'time': 0,
   'coordinates': [{'x': 1202.5, 'y': 486, 'time': 3276},
    {'x': 1162.5, 'y': 484, 'time': 3331},
    {'x': 742.5, 'y': 492.5, 'time': 3487},
    {'x': 673.5, 'y': 501.5, 'time': 3514},
    {'x': 636, 'y': 508.5, 'time': 3539}]},
  {'path': 'path1',
   'time': 3558,
   'coordinates': [{'x': 1237, 'y': 173, 'time': 5437},
    {'x': 1240, 'y': 182, 'time': 5601},
    {'x': 1260, 'y': 161, 'time': 7289},
    {'x': 1263, 'y': 165, 'time': 7465},
    {'x': 1482, 'y': 114.5, 'time': 8072},
    {'x': 1482, 'y': 114, 'time': 8197},
    {'x': 1482, 'y': 126.5, 'time': 9539}]},
  {'path': 'path2',
   'time': 23620,
   'coordinates': [{'x': 227.5, 'y': 420, 'time': 25228},
    {'x': 235, 'y': 418, 'time': 25426}]},
  {'path': 'path3',
   'time': 35891,
   'coordinates': [{'x': 681.5, 'y': 431, 'time': 36648},
    {'x': 704.5, 'y': 427.5, 'time': 36661},
    {'x': 874.5, 'y': 420.5, 'time': 36714},
    {'x': 909.5, 'y': 422, 'time': 36734}]}],
 'day2': {'path': 'path4',
  'time': 36743,
  'coordinates': [{'x': 600, 'y': 622.5, 'time': 37390},
   {'x': 603, 'y': 594.5, 'time': 37448},
   {'x': 605, 'y': 541.5, 'time': 37478},
   {'x': 608.5, 'y': 481.5, 'time': 37495},
   {'x': 620, 'y': 369, 'time': 37530},
   {'x': 624.5, 'y': 329, 'time': 37547},
   {'x': 636, 'y': 366, 'time': 38043}]}}

Now how can get a clean dataframe out of this json file?

penta
  • 2,536
  • 4
  • 25
  • 50
Larsus123
  • 5
  • 2
  • show the desired dataframe output please – gold_cy Mar 26 '19 at 11:32
  • That's the next problem. I was just given this json file and was told to analyze it and to make a dataframe that can be shown in Excel for example. Neither me nor my collegue knows the expected dataframe output =/ – Larsus123 Mar 26 '19 at 11:42
  • that makes it kind of hard to help. so you can either provide context for exactly what it is you have to analyze or you go back to your boss and ask for more information – gold_cy Mar 26 '19 at 11:44
  • Yes I will do that. Thanks for your time. When I have more information, I will write again. – Larsus123 Mar 26 '19 at 11:45
  • @Larsus123 any purticular reason why you are doing `json_data = json.load(json_file) & json_data = json.loads(json_data)` – penta Mar 26 '19 at 12:10
  • Hey penta. I did this, because json_data = json.load(json_file) just gave me a string, but additional json_data = json.loads(json_data) gave me this dictionary. With this I have the feeling I can work a little bit – Larsus123 Mar 26 '19 at 13:13
  • @penta I am getting the hang of it. My collegue is at holiday, so I have to solve it either way. But it seems, that there is acutally a structure in this data :) – Larsus123 Mar 26 '19 at 13:26
  • @Larsus123 refer this, this would help you in clearing confusion https://stackoverflow.com/questions/39719689/what-is-the-difference-between-json-load-and-json-loads-functions load doesn't give you a string loads does – penta Mar 26 '19 at 13:33
  • Thank you very much – Larsus123 Mar 26 '19 at 13:46

1 Answers1

0

This is tricky. You're going to end up with lots of nulls, and I also don't know exactly how you want the end datframe to look like. But maybe this can get you going in the right direction:

jsonObj = {'time': 0,
 'day1': [{'time': 0,
   'coordinates': [{'x': 1202.5, 'y': 486, 'time': 3276},
    {'x': 1162.5, 'y': 484, 'time': 3331},
    {'x': 742.5, 'y': 492.5, 'time': 3487},
    {'x': 673.5, 'y': 501.5, 'time': 3514},
    {'x': 636, 'y': 508.5, 'time': 3539}]},
  {'path': 'path1',
   'time': 3558,
   'coordinates': [{'x': 1237, 'y': 173, 'time': 5437},
    {'x': 1240, 'y': 182, 'time': 5601},
    {'x': 1260, 'y': 161, 'time': 7289},
    {'x': 1263, 'y': 165, 'time': 7465},
    {'x': 1482, 'y': 114.5, 'time': 8072},
    {'x': 1482, 'y': 114, 'time': 8197},
    {'x': 1482, 'y': 126.5, 'time': 9539}]},
  {'path': 'path2',
   'time': 23620,
   'coordinates': [{'x': 227.5, 'y': 420, 'time': 25228},
    {'x': 235, 'y': 418, 'time': 25426}]},
  {'path': 'path3',
   'time': 35891,
   'coordinates': [{'x': 681.5, 'y': 431, 'time': 36648},
    {'x': 704.5, 'y': 427.5, 'time': 36661},
    {'x': 874.5, 'y': 420.5, 'time': 36714},
    {'x': 909.5, 'y': 422, 'time': 36734}]}],
 'day2': {'path': 'path4',
  'time': 36743,
  'coordinates': [{'x': 600, 'y': 622.5, 'time': 37390},
   {'x': 603, 'y': 594.5, 'time': 37448},
   {'x': 605, 'y': 541.5, 'time': 37478},
   {'x': 608.5, 'y': 481.5, 'time': 37495},
   {'x': 620, 'y': 369, 'time': 37530},
   {'x': 624.5, 'y': 329, 'time': 37547},
   {'x': 636, 'y': 366, 'time': 38043}]}}










import pandas as pd
import re


def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

results = pd.DataFrame()
for k in jsonObj:

    flat = flatten_json(jsonObj[k])                      


    temp_df = pd.DataFrame()
    special_cols = []

    columns_list = list(flat.keys())
    for item in columns_list:
        try:
            row_idx = re.findall(r'\_(\d+)\_', item )[0]
        except:
            special_cols.append(item)
            continue
        column = re.findall(r'\_\d+\_(.*)', item )[0]
        column = column.replace('_', '')

        row_idx = int(row_idx)
        value = flat[item]

        temp_df.loc[row_idx, column] = value

    for item in special_cols:
        temp_df[item] = flat[item]

    if 'day' in k:
        temp_df['day'] = k
    results = results.append(temp_df).reset_index(drop=True)

results = results.dropna(axis=1, how='all')
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • Thank you for your support. I found a way program a little bit around and get some insights. If this problem scales up, I am gonna try your code, which looks a lot more efficient than mine. Thanks again! – Larsus123 Mar 27 '19 at 12:52