I'm dealing with an API that unfortunately is returning malformed (or "weirdly formed," rather -- thanks @fjarri) JSON, but on the positive side I think it may be an opportunity for me to learn something about recursion as well as JSON. It's for an app I use to log my workouts, I'm trying to make a backup script.
I can received the JSON fine, but even after requests.get(api_url).json()
(or json.loads(requests.get(api_url).text)
), one of the values is still a JSON encoded string. Luckily, I can just json.loads()
the string and it properly decodes to a dict. The specific key is predictable: timezone_id
, whereas its value varies (because data has been logged in multiple timezones). For example, after decoding, it might be: dump
ed to file as "timezone_id": {\"name\":\"America/Denver\",\"seconds\":\"-21600\"}"
, or load
ed into Python as 'timezone_id': '{"name":"America/Denver","seconds":"-21600"}'
The problem is that I'm using this API to retrieve a fair amount of data, which has several layers of dicts and lists, and the double encoded timezone_id
s occur at multiple levels.
Here's my work so far with some example data, but it seems like I'm pretty far off base.
#! /usr/bin/env python3
import json
from pprint import pprint
my_input = r"""{
"hasMore": false,
"checkins": [
{
"timestamp": 1353193745000,
"timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
"privacy_groups": [
"private"
],
"meta": {
"client_version": "3.0",
"uuid": "fake_UUID"
},
"client_id": "fake_client_id",
"workout_name": "Workout (Nov 17, 2012)",
"fitness_workout_json": {
"exercise_logs": [
{
"timestamp": 1353195716000,
"type": "exercise_log",
"timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
"workout_log_uuid": "fake_UUID"
},
{
"timestamp": 1353195340000,
"type": "exercise_log",
"timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
"workout_log_uuid": "fake_UUID"
}
]
},
"workout_uuid": ""
},
{
"timestamp": 1354485615000,
"user_id": "fake_ID",
"timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
"privacy_groups": [
"private"
],
"meta": {
"uuid": "fake_UUID"
},
"created": 1372023457376,
"workout_name": "Workout (Dec 02, 2012)",
"fitness_workout_json": {
"exercise_logs": [
{
"timestamp": 1354485615000,
"timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
"workout_log_uuid": "fake_UUID"
},
{
"timestamp": 1354485584000,
"timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
"workout_log_uuid": "fake_UUID"
}
]
},
"workout_uuid": ""
}]}"""
def recurse(obj):
if isinstance(obj, list):
for item in obj:
return recurse(item)
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, str):
try:
v = json.loads(v)
except ValueError:
pass
obj.update({k: v})
elif isinstance(v, (dict, list)):
return recurse(v)
pprint(json.loads(my_input, object_hook=recurse))
Any suggestions for a good way to json.loads()
all those double-encoded values without changing the rest of the object? Many thanks in advance!
This post seems to be a good reference: Modifying Deeply-Nested Structures
Edit: This was flagged as a possible duplicate of this question -- I think its fairly different, as I've already demonstrated that using json.loads()
was not working. The solution ended up requiring an object_hook
, which I've never had to use when decoding json and is not addressed in the prior question.