I currently have a function that is supposed to update every dictionary key where the value is the old _id to be the value of a new uuid.
This function works as intended but isn't efficient and takes too long to execute with my current dataset.
What has to happen:
- Each
dict['_id']['$oid']
has to be changed to a new uuid. - Every
$oid
that matches the old value must be changed to new uuid.
The problem is that there are many $oid
keys in every dictionary and I want to make sure this works correctly while being more efficient
As of right now the script takes 45-55 seconds to execute with my current dataset and it's making 45 million calls to the recursive function recursive_id_replace
How can I optimize the current function to get this script to run faster?
The data gets passed into this functions as a dictionary that looks like:
file_data = {
'some_file_name.json': [{...}],
# ...
}
Here's an example of one of the dictionaries in the dataset:
[
{
"_id": {
"$oid": "4e0286ed2f1d40f78a037c41"
},
"code": "HOME",
"name": "Home Address",
"description": "Home Address",
"entity_id": {
"$oid": "58f4eb19736c128b640d5844"
},
"is_master": true,
"is_active": true,
"company_id": {
"$oid": "591082232801952100d9f0c7"
},
"created_at": {
"$date": "2017-05-08T14:35:19.958Z"
},
"created_by": {
"$oid": "590b7fd32801952100d9f051"
},
"updated_at": {
"$date": "2017-05-08T14:35:19.958Z"
},
"updated_by": {
"$oid": "590b7fd32801952100d9f051"
}
},
{
"_id": {
"$oid": "01c593700e704f29be1e3e23"
},
"code": "MAIL",
"name": "Mailing Address",
"description": "Mailing Address",
"entity_id": {
"$oid": "58f4eb1b736c128b640d5845"
},
"is_master": true,
"is_active": true,
"company_id": {
"$oid": "591082232801952100d9f0c7"
},
"created_at": {
"$date": "2017-05-08T14:35:19.980Z"
},
"created_by": {
"$oid": "590b7fd32801952100d9f051"
},
"updated_at": {
"$date": "2017-05-08T14:35:19.980Z"
},
"updated_by": {
"$oid": "590b7fd32801952100d9f051"
}
}
]
Heres the function:
def id_replace(_ids: set, uuids: list, file_data: dict) -> dict:
""" Replaces all old _id's with new uuid.
checks all data keys for old referenced _id and updates it to new uuid value.
"""
def recursive_id_replace(prev_val, new_val, data: Any):
"""
Replaces _ids recursively.
"""
data_type = type(data)
if data_type == list:
for item in data:
recursive_id_replace(prev_val, new_val, item)
if data_type == dict:
for key, val in data.items():
val_type = type(val)
if key == '$oid' and val == prev_val:
data[key] = new_val
elif val_type == dict:
recursive_id_replace(prev_val, new_val, val)
elif val_type == list:
for item in val:
recursive_id_replace(prev_val, new_val, item)
for i, _id in enumerate(_ids):
recursive_id_replace(_id, uuids[i], file_data)
return file_data