what is the efficient way to convert json file to dataframe?

Question

I have a json file format:

[
    {
      "2018-05-11 09:45:10": {
        "change": 6.4,
        "change rate": 1.58,
        "code": "00700",
        "current": 412.6,
        "market_cap": 39212.21,
        "turnover": 419550479.8,
        "volume": 102009800.0
      }
    },
    {
      "2018-05-11 09:45:20": {
        "change": 6.8,
        "change rate": 1.67,
        "code": "00700",
        "current": 413.0,
        "market_cap": 39250.23,
        "turnover": 493879299.8,
        "volume": 120029800.0
        }
    },    ... ]

I would like to convert the json file to dataframe format:

       time   code  current  change  change rate  market_cap  \
0   2018-05-11 09:45:10  00700    412.6     6.4         1.58    39212.21   
1   2018-05-11 09:45:20  00700    413.0     6.8         1.67    39250.23   
2   2018-05-11 09:45:30  00700    413.2     7.0         1.72    39269.23   
3   2018-05-11 09:45:40  00700    413.0     6.8         1.67    39250.23   
4   2018-05-11 09:45:50  00700    413.0     6.8         1.67    39250.23
...

        turnover       volume  
0   4.195505e+08  102009800.0  
1   4.938793e+08  120029800.0  
2   5.581315e+08  135588900.0  
3   5.804374e+08  140989900.0  
4   5.956777e+08  144679900.0  
...

It is my code:

def convert_json_file_to_df(file_path):    
    with open(file_path, encoding='utf-8') as fh:
        jd = json.load(fh, encoding='utf-8')

    col_list = ["time", "code", "current", "change", "change rate", "market_cap", "turnover", "volume"]
    df = pd.DataFrame(columns=col_list)

    for d in jd:
        for key, value in d.items():
            df = df.append({"time": key,
                            "code": value["code"],
                            "current": value["current"],
                            "change": value["change"],
                            "change rate": value["change rate"],
                            "market_cap": value["market_cap"],
                            "turnover": value["turnover"],
                            "volume": value["volume"]
                            }, ignore_index=True)

    print(df)

I want to have a short and efficient way to convert json file to dataframe. The code what I am write, I think it is a slow and not good looking. Did there any more efficient way? The other question is how can I append the json file in dict format? Many thanks

Update: The code of append json file

def save_dict_to_json_file(dict, filepath):
    if ((type(dict)!=type({})) or (not dict) or (not filepath)):
        return FALSE

    try:
        with open(filepath, encoding='utf-8') as f:
            json_data = json.load(f, encoding='utf-8')

        json_data.append(dict)

        with open(filepath, mode='w', encoding='utf-8') as f:
            json.dump(json_data, f, ensure_ascii=False, indent=2, sort_keys=True)

        return TRUE

    except Exception as e:
        traceback.print_exc()
        err = sys.exc_info()[1]

        return FALSE, str(err)

Can you explain more `The other question is how can I append the json file in dict format?` ? — jezrael, May 12 '18 at 13:19
@jezrael now I am using [ ] to store the json, when I using `json.load` to read the file, the type of output is . I try to use { } replace [ ], but It will read the file: `json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 5 (char 6)` — shadow dk, May 12 '18 at 13:22
Do you create the json? If so, creating a file like `{date1:{},date2:{}}` (a dict of dicts where the labels of the first dicts are the index value (date) and the labels of the second dict are the columns like you have now) would reduce the problem to `pd.read_json(file_path,orient='index')` — OriolAbril, May 12 '18 at 13:32
@shadowdk - Hard to know without file, are data confidental? — jezrael, May 12 '18 at 13:43
@jezrael I just update the code of what I am using to append json file — shadow dk, May 12 '18 at 14:06
@shadowdk - I check it and hard to know withoud data, I cannot test it and debug :( — jezrael, May 12 '18 at 14:23
@jezrael due to data are real time data, I cannot support here. — shadow dk, May 12 '18 at 14:31
@shadowdk - because if `json.decoder.JSONDecodeError:` it seems some invalid json, is not possible get this invalid one? — jezrael, May 12 '18 at 14:32

score 0 · Accepted Answer · answered May 12 '18 at 13:10

I think you can use list comprehension with flatening nested dictionaries, add new element time and last DataFrame contructor:

L = [dict(v, time=k) for x in jd for k, v in x.items()]
print (L)
[{'change': 6.4, 'change rate': 1.58, 'code': '00700', 'current': 412.6, 
  'market_cap': 39212.21, 'turnover': 419550479.8, 'volume': 102009800.0, 
  'time': '2018-05-11 09:45:10'}, 
 {'change': 6.8, 'change rate': 1.67, 'code': '00700', 'current': 413.0, 
  'market_cap': 39250.23, 'turnover': 493879299.8, 'volume': 120029800.0,
  'time': '2018-05-11 09:45:20'}]

col_list = ["time", "code", "current", "change", 
            "change rate", "market_cap", "turnover", "volume"]
df = pd.DataFrame(L, columns=col_list)
print (df)
                  time   code  current  change  change rate  market_cap  \
0  2018-05-11 09:45:10  00700    412.6     6.4         1.58    39212.21   
1  2018-05-11 09:45:20  00700    413.0     6.8         1.67    39250.23   

      turnover       volume  
0  419550479.8  102009800.0  
1  493879299.8  120029800.0

what is the efficient way to convert json file to dataframe?

1 Answers1