0

how to flattening JSON to pd.dataframe like this:

class_id|id |schedule_id |schedule_date |lesson_price |status`
    1   | 3 |    1       | 2017-07-11   |   USD 25    | ONGOING
    1   | 3 |    2       | 2016-09-24   |   USD 15    | OPEN REGISTRATION
    1   | 4 |    1       | 2016-12-17   |   USD 19    | ONGOING
    1   | 4 |    2       | 2015-11-12   |   USD 29    | ONGOING
    1   | 4 |    3       | 2015-11-10   |   USD 14    | ON SCHEDULE
    2   | 1 |    1       | 2017-05-21   |   USD 50    | CANCELLED
    2   | 2 |    1       | 2017-06-04   |   USD10     | FINISHED
    2   | 2 |    2       | 2018-03-01   |   USD12     | CLOSED

from JSON

I've tried from this reference but I give me 2 line groupby class_id

how to show all data schedule with class_id and id from lesson object like the desired dataframe?

frhdn
  • 41
  • 5

1 Answers1

0

The difficulty in your data structure comes from

{
  "lesson3": {
    "id": 3,
    "schedule": [
      {
        "schedule_id": "1",
        "schedule_date": "2017-07-11",
        "lesson_price": "USD 25",
        "status": "ONGOING"
      },
      {
        "schedule_id": "2",
        "schedule_date": "2016-09-24",
        "lesson_price": "USD 15",
        "status": "OPEN REGISTRATION"
      }
    ]
  }
}

It would be better to have

{
  "name": "lesson3",
  "id": 3,
  "schedule": [
    {
      "schedule_id": "1",
      "schedule_date": "2017-07-11",
      "lesson_price": "USD 25",
      "status": "ONGOING"
    },
    {
      "schedule_id": "2",
      "schedule_date": "2016-09-24",
      "lesson_price": "USD 15",
      "status": "OPEN REGISTRATION"
    }
  ]
}

But we don't have control on the data we get most of the time. So we have to get rid of the lesson1, lesson2 keys and move the object up.

Solution

import requests
data = requests.get(url).json()

Extract the distinct lessons

data_ = [{'class_id': c['class_id'], 'lessons': v} for c in data['class'] for d, v in c['data'].items()]

The data looks like this now

[
  {
    "class_id": "1",
    "lessons": {
      "id": 3,
      "schedule": [
        {
          "schedule_id": "1",
          "schedule_date": "2017-07-11",
          "lesson_price": "USD 25",
          "status": "ONGOING"
        },
        {
          "schedule_id": "2",
          "schedule_date": "2016-09-24",
          "lesson_price": "USD 15",
          "status": "OPEN REGISTRATION"
        }
      ]
    }
  },
  ...
]

Now we can read it into pandas DataFrame using json_normalize

df = json_normalize(data_, record_path=['lessons', 'schedule'], meta=['class_id', ['lessons', 'id']])

Output

  schedule_id schedule_date lesson_price             status class_id lessons.id
0           1    2017-07-11       USD 25            ONGOING        1          3
1           2    2016-09-24       USD 15  OPEN REGISTRATION        1          3
2           1    2016-12-17       USD 19            ONGOING        1          4
3           2    2015-11-12       USD 29            ONGOING        1          4
4           3    2015-11-10       USD 14        ON SCHEDULE        1          4
5           1    2017-05-21       USD 50          CANCELLED        2          1
6           1    2017-06-04        USD10           FINISHED        2          2
7           5    2018-03-01        USD12             CLOSED        2          2
megges
  • 570
  • 1
  • 3
  • 11