0

I have a link which consist of json data as below:

http://www.bom.gov.au/fwo/IDQ60801/IDQ60801.94182.json

Sample Data from link :

"header": [
        {
            "refresh_message": "Issued at 12:02 pm EST Thursday  1 March 2018",
            "ID": "IDQ60801",
            "main_ID": "IDQ60800",
            "name": "Coconut Island",
            "state_time_zone": "QLD",
            "time_zone": "EST",
            "product_name": "Weather Observations",
            "state": "Queensland"
        }
    ],

How can I load the above data into postgres table via python? Should I load it into a dataframe first before loading it into postgres? I'm new to dataframe and pandas so would like to seek your help on it.

My expected table in postgres would be something like below:

ID      | main_id|     name     | state_timezone | time_zone |   product_name     |     state
IDQ60801|IDQ60800|Coconut Island|       QLD      |    EST    |Weather Observations| Queensland

Thanks for your help in advance

Yannic Hamann
  • 4,655
  • 32
  • 50
Derek Lee
  • 475
  • 1
  • 6
  • 20
  • This doesn't seem to be related to pandas or dataframes. You need a postgres driver (e.g. http://initd.org/psycopg/ ) and then you just loop through the JSON list/object, create an appropriate SQL query and run it. – freakish Mar 05 '18 at 20:15

1 Answers1

1

Skip the dataframe and pandas.

Use Python's json module to parse the data. That will give you the data in the form of a dictionary.

import json
import requests

json_response = requests.get('http://www.bom.gov.au/fwo/IDQ60801/IDQ60801.94182.json')
json_content = json.loads(json_response.content)

# navigate the json, which are nested lists of dicts
# this below gives you the first, and only, header-dict
header_dict = json_content['observations']['header'][0]

With your data in a dictionary you can follow this great answer on how to get it into your DB: Insert Python Dictionary using Psycopg2

Karl Anka
  • 2,529
  • 1
  • 19
  • 30
  • How can i extract the values from the "data" dict then? as there are multiple { } in the "data" dict – Derek Lee Mar 05 '18 at 20:41
  • They are multiple dicts in this list: `json_content['observations']['data']`. So just loop over the list to get each dict with: `for data_dict in json_content['observations']['data']:` – Karl Anka Mar 05 '18 at 20:43
  • I have been using this : `data = json_content['observations']['data'][0]` `for data2 in data: print(data2)` `Result: sort_order, wmo, name, history_product, local_date_time, local_date_time_full,` With the above, i only get the header, how can i get the value as well? – Derek Lee Mar 05 '18 at 20:50
  • `data = json_content['observations']['data'][0]` skip the zero at the end here. Your json-data has three items which are all lists of dictionaries: `notice`,`header`, and `data`. However, `notice` and `header` only contain one dict each while `data` has 145. With your code above you assign the first dict in the data-list to your variable `data` and then loop over this sole dict's keys. – Karl Anka Mar 05 '18 at 20:59
  • Thanks for the info, now i'm able to get all the details in `data` . Just to add 1 more question, how am i able to only print 1 header's details from `data = json_content['observations']['data']' ? For example, I want to only get all `local_date_time` from all dict in `data` – Derek Lee Mar 05 '18 at 21:16
  • Access them using their respective keys. For `local_date_time` use: `for data_dict in json_content['observations']['data']: local_date_time = data_dict['local_date_time']` – Karl Anka Mar 05 '18 at 21:19
  • 1
    Hey, i got the answer for that, I used the below code : `data = json_content['observations']['data'] for result in data: print(result['local_date_time'])` – Derek Lee Mar 05 '18 at 21:19
  • 1
    Really thanks for your help Karl, appreciate it a lot :) – Derek Lee Mar 05 '18 at 21:22