I'm dealing with a CSV with JSON data for the first time. I have uploaded the CSV file into Python (and I try also in R) as a dataframe. The CSV looks like this:
This file is about sales. In every row I have the purchase details. One client can buy several products. I have lots of columns with details like delivery address, payment, etc. The column ITEMS has JSON data inside (in Python is an Object type). What I want is to flatten that column into several columns and join them into the original dataframe. The JSON looks like this (Notice that this client have bought two items, so I'll have to join the new columns and "repeat" the row with the client info):
[
{
"_id": {
"$oid": "5fff7d777bdb300010b1f172"
},
"product": {
"productCategories": [
{
"$oid": "2afe7b8b3a2c4e1b4a8b22f5"
},
{
"$oid": "ca9bf1e1d9adf5b170b12fae"
}
],
"_id": {
"$oid": "5ffc9c35fc8985001803d8b5"
},
"title": "Rice",
"productType": {
"$oid": "e1f2a11798b7f81ce44a5696"
},
"store": {
"$oid": "5ffb729242ad2200107dcb42"
},
"price": 50,
"stockMethod": {
"_id": {
"$oid": "5ffc9c49fc8985001803d8b7"
},
"value": "always"
},
"stock": null,
"showInStore": true,
"available": true,
"createdAt": {
"$date": "2021-01-11T18:43:01.103Z"
},
"modifiedAt": {
"$date": "2021-01-13T23:08:39.102Z"
},
"__v": 0
},
"quantity": 1
},
{
"_id": {
"$oid": "5fff7d777bdb300010b1f175"
},
"product": {
"productCategories": [
{
"$oid": "d4abd6888580e00d9f746cac"
},
{
"$oid": "a018049c6fc0253ee5aa6a8f"
}
],
"_id": {
"$oid": "5ffcaa90fc8985001803d97c"
},
"title": "Beans",
"productType": {
"$oid": "e1f2a11798b7f81ce44a5696"
},
"store": {
"$oid": "5ffb729242ad2200107dcb42"
},
"price": 130,
"stockMethod": {
"_id": {
"$oid": "5ffcaa90fc8985001803d97d"
},
"value": "alwasy"
},
"stock": null,
"showInStore": true,
"available": true,
"createdAt": {
"$date": "2021-01-11T19:44:16.004Z"
},
"modifiedAt": {
"$date": "2021-01-13T23:08:39.102Z"
},
"__v": 0
},
"quantity": 1
}
]
I have tried different solutions like this and this but I can't solve it. For example, I've tried:
import json
from io import StringIO
stdf = orders['items'].apply(json.loads)
stlst = list(stdf)
stjson = json.dumps(stlst)
pd.read_json(StringIO(stjson))
And I get something like: