1

I have very large json data with the following syntax:

[
 {
   "origin": 101011001,
   "destinations": [
    {"destination": 101011001, "people": 7378},
    {"destination": 101011002, "people": 120}
   ]
 },
 {
   "origin": 101011002,
   "destinations": [
    {"destination": 101011001, "people": 754},
 }
]

[enter image description here][1]

My goal is the convert the data to a pandas dataframe which I then want to convert to sql to store as a table in my postgresql database.

I want to create a pandas dataframe like this:

origin    destination  people
101011001 101011001    7378
101011001 101011002    120 
101011002 101011001    754

Right now, I can only get columns 'origin' and 'destinations' where destinations is a list containing both the destination and people values, using pandas.read_json().

How can I achieve the above dataframe?

Jenny Char
  • 81
  • 2
  • 8

1 Answers1

1

Use json_normalize. This should work as intended:

Edit (from string to list of dicts, then json_normalize)

data = """[
 {
   "origin": 101011001,
   "destinations": [
    {"destination": 101011001, "people": 7378},
    {"destination": 101011002, "people": 120}
   ]
 },
 {
   "origin": 101011002,
   "destinations": [
    {"destination": 101011001, "people": 754}]
 }
]"""

from pandas import json_normalize
import json

data = json.loads(data)
df = json_normalize(data,"destinations",['origin'])
df.head()
Partha Mandal
  • 1,391
  • 8
  • 14