0

I am looking into ways on how I can flatten the nested list in a dict which is nested in a list using python.

The example as below:

[
 {
    "id": 8,
    "category": {
        "id": 0,
        "name": "lion"
    },
    "name": "Leon",
    "photoUrls": [
        "123",
        "444",
    ],
    "tags": [
        {
            "id": 1,
            "name": "TagLion"
        },
        {
            "id": 2,
            "name": "KingOfTheJungle"
        }
    ],
},

{
    "id": 83,
    "category": {
        "id": 0,
        "name": "dog UPDATED"
    },
    "name": "Buff",
    "photoUrls": [
        "333",
    ],
    "tags": [
        {
            "id": 1,
            "name": "TagNumber1UPDATED"
        },
        {
            "id": 2,
            "name": "DogWithStickUPDATED"
        }
    ],
}
]

From the example above, which is a return from API, I would like to write the output to csv. But the catch here is on the "tags" where it is a nested list. I would like the above result to be flatten to csv format as below:

id | category                       | name | photoUrls     | tags
 8 |{'id': 0, 'name': 'dog UPDATED'}| Leon | 123           | {'id': 1, "name": "TagLion"}
83 |{'id': 0, 'name': 'dog UPDATED'}| Buff | 333           | {"id": 1,"name": "TagNumber1UPDATED"}
83 |{'id': 0, 'name': 'dog UPDATED'}| Buff | 333           | {"id": 2,"name": "name": "DogWithStickUPDATED"}

How can I do this using python? Would like this to be set as a configuration and when loading to csv, python will look for this config to flatten the array "tags"

EDIT: would like to flatten the photourls column as well which is a array. result as below , by piping it instead of splitting it.

id | category                       | name | photoUrls     | tags
 8 |{'id': 0, 'name': 'dog UPDATED'}| Leon | 123 |444      | {'id': 1, "name": "TagLion"}
 8 |{'id': 0, 'name': 'dog UPDATED'}| Leon | 123           | {'id': 1, "name": "TagLion"}
 8 |{'id': 0, 'name': 'dog UPDATED'}| Leon | 123           | {'id': 2, "name": "KingOfTheJungle"}
83 |{'id': 0, 'name': 'dog UPDATED'}| Buff | 333           | {"id": 1,"name": "TagNumber1UPDATED"}
83 |{'id': 0, 'name': 'dog UPDATED'}| Buff | 333           | {"id": 2,"name": "name": "DogWithStickUPDATED"}
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
Derek Lee
  • 475
  • 1
  • 6
  • 20

2 Answers2

1

You could use the power of the magical pandas package:

The code:

import pandas as pd

data = [] # your list is here

df = pd.DataFrame(data)

# expand 'tags' column into multiple rows
tags = df.apply(lambda x: pd.Series(x['tags']), axis=1).stack().reset_index(level=1, drop=True)
tags.name = 'tags'
df = df.drop('tags', axis=1).join(tags)

print(df)

Prints:

                           category  id  name photoUrls                                      tags
0         {'id': 0, 'name': 'lion'}   8  Leon     [123]              {'id': 1, 'name': 'TagLion'}
0         {'id': 0, 'name': 'lion'}   8  Leon     [123]      {'id': 2, 'name': 'KingOfTheJungle'}
1  {'id': 0, 'name': 'dog UPDATED'}  83  Buff     [333]    {'id': 1, 'name': 'TagNumber1UPDATED'}
1  {'id': 0, 'name': 'dog UPDATED'}  83  Buff     [333]  {'id': 2, 'name': 'DogWithStickUPDATED'}

For dumping to CSV, you could use .to_csv() method.


You could also extract that "expand column" logic into a separate method and reuse:

def expand_column(df, column_name):
    c = df.apply(lambda x: pd.Series(x[column_name]), axis=1).stack().reset_index(level=1, drop=True)
    c.name = column_name
    return df.drop(column_name, axis=1).join(c)

Usage:

df = pd.DataFrame(data)
df = expand_column(df, 'tags')
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • I'll try it out in a minute. what if I want to expand the category as well? if there's multiple array in the category and multiple array in tags? – Derek Lee Dec 18 '18 at 01:51
  • @DerekLee yeah, you could do the exact same thing with `category` column. – alecxe Dec 18 '18 at 01:52
  • @DerekLee I've added a reusable function to the answer, but not sure if you can directly apply it to `category` as it is a dictionary and not a list as tags.. – alecxe Dec 18 '18 at 02:02
  • Thanks @alexcxe! that really helps alot. Just 1 more question, looking into the edit section, i wanted to pip the array in the **photosurl** as well. How can i do that? :) – Derek Lee Dec 18 '18 at 02:13
1

You can use a nested comprehension:

import csv
d = [{'id': 8, 'category': {'id': 0, 'name': 'lion'}, 'name': 'Leon', 'photoUrls': ['123'], 'tags': [{'id': 1, 'name': 'TagLion'}, {'id': 2, 'name': 'KingOfTheJungle'}]}, {'id': 83, 'category': {'id': 0, 'name': 'dog UPDATED'}, 'name': 'Buff', 'photoUrls': ['333'], 'tags': [{'id': 1, 'name': 'TagNumber1UPDATED'}, {'id': 2, 'name': 'DogWithStickUPDATED'}]}]
new_d = [[i['id'], i['category'], i['name'], *i["photoUrls"], c] for i in d for c in i['tags']]
with open('results.csv', 'w') as f:
  write = csv.writer(f)
  write.writerows([['id', 'category', 'name', 'photoUrls', 'tags'], *new_d])

Output:

id,category,name,photoUrls,tags
8,"{'id': 0, 'name': 'lion'}",Leon,123,"{'id': 1, 'name': 'TagLion'}"
8,"{'id': 0, 'name': 'lion'}",Leon,123,"{'id': 2, 'name': 'KingOfTheJungle'}"
83,"{'id': 0, 'name': 'dog UPDATED'}",Buff,333,"{'id': 1, 'name': 'TagNumber1UPDATED'}"
83,"{'id': 0, 'name': 'dog UPDATED'}",Buff,333,"{'id': 2, 'name': 'DogWithStickUPDATED'}"
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • Does this work if my **category** column has mutiple array as well? How do i flatten the **tags** and **category** together? – Derek Lee Dec 18 '18 at 02:01
  • @DerekLee Please post your desired output from a dataset that contains multiple category values. – Ajax1234 Dec 18 '18 at 02:03
  • sorry ignore the previous question. my main question is, how to flatten the **photosurl** array field and pip it? Look into the edit section on my post for the desire output. thanks :) – Derek Lee Dec 18 '18 at 02:14