I have a problem. I want to normalize with pd.json_normalize(...)
a list
with inside dict
but unfortunately I got a MemoryError
. Is there an option to work around this error? Well it worked with pd.json_normalize(my_data[:2000000], sep="_")
but not with the complete data (2549150)
I looked at MemoryError: Unable to allocate MiB for an array with shape and data type, when using anymodel.fit() in sklearn , and Unable to allocate array with shape and data type
my_data = [
{'_id': 'orders/213123',
'contactEditor': {'name': 'Max Power',
'phone': '1234567',
'email': 'max@power.com'},
'contactSoldToParty': {'name': 'Max Not',
'phone': '123456789',
'email': 'maxnot@power.com'},
'isCompleteDelivery': False,
'metaData': {'dataOriginSystem': 'Goods',
'dataOriginWasCreatedTime': '10:12:12',},
'orderDate': '2021-02-22',
'orderDateBuyer': '2021-02-22',
},
{'_id': 'orders/12323',
'contactEditor': {'name': 'Max Power2',
'phone': '1234567',
'email': 'max@power.com'},
'contactSoldToParty': {'name': 'Max Not',
'phone': '123456789',
'email': 'maxnot@power.com'},
'isCompleteDelivery': False,
'metaData': {'dataOriginSystem': 'Goods',
'dataOriginWasCreatedTime': '10:12:12',},
'orderDate': '2021-02-22',
'orderDateBuyer': '2021-02-22',
},
]
df = pd.json_normalize(my_data, sep="_")
[OUT]
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11136/3519902863.py in <module>
----> 1 df= pd.json_normalize(my_data, sep='_')
MemoryError: Unable to allocate 1.88 GiB for an array with shape (2549150, 99) and data type object
What I want
id contactEditor_name contactEditor_phone contactEditor_email ...
orders/213123 Max Power ... ... ...
orders/12323 Max Power2 ... ... ...
Length of len(my_data)
is 2549150`