I have a big .json document that contains a json on each line:
{"_id": "60ddad", "type": ["test"], "company": ["60dd888"], "answers": [], "info": {}, "createdAt": "2021-07-01T11:57:08.492Z","__v": 0}
{"_id": "60deb", "type": ["test"], "company": ["60dea"], "answers": [], "info": {}, "createdAt": "2021-07-02T07:07:27.436Z","__v": 0, "sentence": {}, "text": {}}
{"_id": "60debb2", "type": ["exam"], "company": ["60dea"], "answers": ["option1"], "info": {}, "createdAt": "2021-07-02T07:07:27.451Z", "__v": 0, "sentence": {}, "text": {}}
I am trying to delete the empty struct types, such as "text": {}
.
Is there any way of removing all the empty structs? A workaround would be to eliminate these certain keys that might contain empty structs, but it is possible that once in a while they contain a non-empty struct.
I was thinking of:
import json
def empty_structs(dictionary):
#do things
with open('C:\\my\\path\\file.json', 'r', encoding="utf8") as handle:
data = handle.read()
dicts = parse_ndjson(data)
for d in dicts:
new_d = empty_structs(d)
json_string=json.dumps(new_d, ensure_ascii=False)
print(json_string)
Expected output:
{"_id": "60ddad", "type": ["test"], "company": ["60dd888"], "answers": [], "createdAt": "2021-07-01T11:57:08.492Z","__v": 0}
{"_id": "60deb", "type": ["test"], "company": ["60dea"], "answers": [], "createdAt": "2021-07-02T07:07:27.436Z","__v": 0}
{"_id": "60debb2", "type": ["exam"], "company": ["60dea"], "answers": ["option1"], "createdAt": "2021-07-02T07:07:27.451Z", "__v": 0}