Normalize json column and join with rest of dataframe

Question

This is my first question here on stackoverflow so please don't roast me. I was trying to find similar problems on the internet and actually there are several, but for me the solutions didn't work.

I have created this dataframe:

import pandas as pd
from ast import literal_eval
d = {'order_id': [1], 'email': ["hi@test.com"], 'line_items': ["[{'sku':'testproduct1', 'quantity':'2'},{'sku':'testproduct2','quantity':'2'}]"]}
orders = pd.DataFrame(data=d)

It looks like this:

order_id    email         line_items
1           hi@test.com   [{'sku':'testproduct1', 'quantity':'2'},{'sku':'testproduct2','quantity':'2'}]

I want the dataframe to look like this:

order_id    email               line_items.sku        line_items.quantity
1           hi@test.com         testproduct1           2
1           hi@test.com         testproduct2           2

I used the following code to change the type of line_items from string to dict:

orders.line_items = orders.line_items.apply(literal_eval)

Normally I would use json_normalize now to flatten the line_items column. But I also want to keep the id and don't know how to do that. I also want to avoid any loops.

Is there anyone who can help me with this issue?

Kind regards
joant95

please add your input json and the dataframe you're joining too. — Umar.H, Jul 27 '21 at 09:07
Welcome to stackoverflow, please read [tour] and [mre] and in this case also: [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Andreas, Jul 27 '21 at 09:10
Have you tried `df = pd.json_normalize(d, record_path=['line_items'], meta=['order_id', 'email'])`? — Timus, Jul 27 '21 at 10:17

Timus · Accepted Answer · 2021-07-27T12:52:27.837

1

If your dictionary really is that strange, then you could try:

d['line_items'] = eval(d['line_items'][0])
df = pd.json_normalize(d, record_path=['line_items'], meta=['order_id', 'email'])

To create d out of orders you could try:

d = orders.to_dict(orient='list')

Or you could try:

orders.line_items = orders.line_items.map(eval)
d = orders.to_dict(orient='records')
df = pd.json_normalize(d, record_path=['line_items'], meta=['order_id', 'email'])

But: I still don't have a clear picture of the situation :)

edited Jul 27 '21 at 12:52

answered Jul 27 '21 at 10:35

Timus

10,974
5
14
28

Hey, thanks for your answer. This works for d. Thing is: I didn't create the original DF. So I don't have "d" in this case. How would you do it if you had to replace d with the orders dataframe? – joant95 Jul 27 '21 at 11:25
1

Wow, the second one worked. Just to give you an overview what my goal was. We save daily order data on s3 as csv files since it is faster to download the files and concat them instead of querying a nosql database on amazon. Unfortunately the column 'line_items' is a string and not a dict. So my goal was to flatten the product dictionary without losing customer information. Thank you :) – joant95 Jul 27 '21 at 16:01
@joant95 Thanks for the feedback. Makes sense now! – Timus Jul 27 '21 at 17:05

Normalize json column and join with rest of dataframe

1 Answers1