The difficulty in your data structure comes from
{
"lesson3": {
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
}
}
It would be better to have
{
"name": "lesson3",
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
}
But we don't have control on the data we get most of the time. So we have to get rid of the lesson1, lesson2 keys and move the object up.
Solution
import requests
data = requests.get(url).json()
Extract the distinct lessons
data_ = [{'class_id': c['class_id'], 'lessons': v} for c in data['class'] for d, v in c['data'].items()]
The data looks like this now
[
{
"class_id": "1",
"lessons": {
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
}
},
...
]
Now we can read it into pandas DataFrame using json_normalize
df = json_normalize(data_, record_path=['lessons', 'schedule'], meta=['class_id', ['lessons', 'id']])
Output
schedule_id schedule_date lesson_price status class_id lessons.id
0 1 2017-07-11 USD 25 ONGOING 1 3
1 2 2016-09-24 USD 15 OPEN REGISTRATION 1 3
2 1 2016-12-17 USD 19 ONGOING 1 4
3 2 2015-11-12 USD 29 ONGOING 1 4
4 3 2015-11-10 USD 14 ON SCHEDULE 1 4
5 1 2017-05-21 USD 50 CANCELLED 2 1
6 1 2017-06-04 USD10 FINISHED 2 2
7 5 2018-03-01 USD12 CLOSED 2 2