0

I'm wondering if there's a concise, pythonic way to do this

  phone
0 {"brand":{"type":"android"},"names":[{"id":"1", "name":"a-1"},{"id":"2", "name":"a-2"}]}
1 {"brand":{"type":"iphone"},"names":[{"id":"3", "name":"i-1"},{"id":"4", "name":"i-2"}]}

I want to expand the json field to be data fields, to get this:

   type       id  name
0  android    1   a-1
1  android    2   a-2
2  iphone     3   i-1
3  iphone     4   i-2  


I have found a good solution:
def parser_expand_json(data):
    keys = []
    values = []
    for key in data:
        keys.append(key)
        values.append(data.get(key))

    return pd.Series(values, index=keys)

# that is it
def test():
    data = [{'brand': {'type': 'android'}, 'names': [{'id': '1', 'name': 'a-1'}, {'id': '2', 'name': 'a-2'}]},
            {'brand': {'type': 'iphone'}, 'names': [{'id': '3', 'name': 'i-1'}, {'id': '4', 'name': 'i-2'}]}]
    df = pd.DataFrame(data)

    # expand json list to N rows
    df = df.merge(df['names'].apply(pd.Series), right_index=True, left_index=True).drop('names', axis=1).melt(
        id_vars=['brand'], value_name='names').drop('variable', axis=1)

    """
                       brand                           names
    0  {u'type': u'android'}  {u'id': u'1', u'name': u'a-1'}
    1   {u'type': u'iphone'}  {u'id': u'3', u'name': u'i-1'}
    2  {u'type': u'android'}  {u'id': u'2', u'name': u'a-2'}
    3   {u'type': u'iphone'}  {u'id': u'4', u'name': u'i-2'}
    """
    print df

    # expand json key to columns name
    df = pd.concat([df, df['brand'].apply(parser_expand_json), df['names'].apply(parser_expand_json)], axis=1).drop(
        ['brand', 'names'], axis=1)

    """
          type id name
    0  android  1  a-1
    1   iphone  3  i-1
    2  android  2  a-2
    3   iphone  4  i-2
    """
    print df
kolin
  • 63
  • 7
  • `json_normalize` is your friend. Also can you add the original JSON file? – Code Different Aug 21 '19 at 14:47
  • @Code Different file content such as: 2019-08-21 - {"brand":{"type":"android"},"names":[{"id":"1", "name":"a-1"},{"id":"2", "name":"a-2"}]} 2019-08-22 - {"brand":{"type":"android"},"names":[{"id":"1", "name":"a-1"},{"id":"2", "name":"a-2"}]} – kolin Aug 22 '19 at 03:23

1 Answers1

0

A solution using lists to manually build a new DataFrame with the desired structure:

import pandas as pd

json = [
  {"brand":{"type":"android"},"names":[{"id":"1", "name":"a-1"},{"id":"2", "name":"a-2"}]},
  {"brand":{"type":"iphone"},"names":[{"id":"3", "name":"i-1"},{"id":"4", "name":"i-2"}]}
  ]

json_data = {'phone': json}

df_1 = pd.DataFrame(json_data)

type_list = []
id_list = []
name_list = []

for row in df_1.phone:
    for item in row['names']:
        type_list.append(row['brand']['type'])
        id_list.append(item['id'])
        name_list.append(item['name'])

data = {'type':type_list, 'id':id_list, 'name':name_list}

df_2 = pd.DataFrame(data)

To use json_normalize(), we must first restructure the json to the desired column structure. The solution in this case looks like:

import pandas as pd
from pandas.io.json import json_normalize

json = [
  {"brand":{"type":"android"},"names":[{"id":"1", "name":"a-1"},{"id":"2", "name":"a-2"}]},
  {"brand":{"type":"iphone"},"names":[{"id":"3", "name":"i-1"},{"id":"4", "name":"i-2"}]}
  ]

json_mod = []
for row in json:
    for item in row['names']:
        json_mod.append({'type':row['brand']['type'],'id':item['id'],'name':item['name']})

df_3 = json_normalize(json_mod)

df_2 and df_3 both appear as:

  id name     type
0  1  a-1  android
1  2  a-2  android
2  3  i-1   iphone
3  4  i-2   iphone
pjw
  • 2,133
  • 3
  • 27
  • 44
  • that is a way, but I'm wondering if there's a concise, pythonic (pandastic?) way to do this? – kolin Aug 22 '19 at 03:22
  • @kolin I edited my answer to include a solution using `json_normalize()`. Although this still requires the nested `for` loop to restructure the json, this is a more concise approach. – pjw Aug 22 '19 at 14:23
  • thanks for your reply, but It's still not the "beautiful" answer I want. what about df.apply() ? – kolin Aug 23 '19 at 03:17