1

I have called some data from the World Health Organisation in JSON format, I would like to read this into a Pandas DataFrame.

I have called this from this page: WHO Measles First Dose Rate

{u'dimension': [{u'display': u'Indicator', u'label': u'GHO'},
  {u'display': u'PUBLISH STATES', u'label': u'PUBLISHSTATE'},
  {u'display': u'Year', u'label': u'YEAR'},
  {u'display': u'WHO region', u'label': u'REGION'},
  {u'display': u'World Bank income group', u'label': u'WORLDBANKINCOMEGROUP'},
  {u'display': u'Country', u'label': u'COUNTRY'}],
 u'fact': [{u'Value': u'25',
   u'dim': {u'COUNTRY': u'Afghanistan',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Eastern Mediterranean',
    u'WORLDBANKINCOMEGROUP': u'Low-income',
    u'YEAR': u'1993'}},
  {u'Value': u'57',
   u'dim': {u'COUNTRY': u'Afghanistan',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Eastern Mediterranean',
    u'WORLDBANKINCOMEGROUP': u'Low-income',
    u'YEAR': u'2013'}},
  {u'Value': u'62',
   u'dim': {u'COUNTRY': u'Angola',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Africa',
    u'WORLDBANKINCOMEGROUP': u'Upper-middle-income',
    u'YEAR': u'1996'}},
  {u'Value': u'94',
   u'dim': {u'COUNTRY': u'Andorra',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Europe',
    u'WORLDBANKINCOMEGROUP': u'High-income',
    u'YEAR': u'2005'}},
  {u'Value': u'34',
   u'dim': {u'COUNTRY': u'United Arab Emirates',
    u'GHO': u'Measles-containing-vaccine first-dose (MCV1) immunization coverage among 1-year-olds (%)',
    u'PUBLISHSTATE': u'Published',
    u'REGION': u'Eastern Mediterranean',
    u'WORLDBANKINCOMEGROUP': u'High-income',
    u'YEAR': u'1980'}},

I have tried

#Setting Up and loading JSON into object ready to turn into dataframe
url = "http://apps.who.int/gho/athena/data/GHO/WHS8_110.json?profile=simple&filter=COUNTRY:*"
response = requests.get(url)
response_json = response.content
json.loads(response_json)
whoDataSetVaccinationRate = json.loads(response_json)    

#Attempt to load JSON Data into Pandas Dataframe
whoDataSetVaccinationRateDF = pd.DataFrame(whoDataSetVaccinationRate['fact']
, columns=['COUNTRY', 'YEAR','REGION'])
whoDataSetVaccinationRateDF

And that seems to read - but I only get NaN values in the dataframe for COUNTRY and YEAR: DataFrameError

And I realise I want it laid out differently in the dataframe anyway - and I'm not sure how to call that. This is how I want my dataframe to look: DataFrameGood

kiltannen
  • 1,057
  • 2
  • 12
  • 27

1 Answers1

3

Use json_normalize with pivot:

from pandas.io.json import json_normalize   
import urllib.request, json 

#https://stackoverflow.com/a/12965254
url = "http://apps.who.int/gho/athena/data/GHO/WHS8_110.json?profile=simple&filter=COUNTRY:*"

with urllib.request.urlopen(url) as url:
    data = json.loads(url.read().decode())

df = json_normalize(data['fact']).pivot('dim.COUNTRY','dim.YEAR','Value').astype(float)
print (df.head())

dim.YEAR     1980  1981  1982  1983  1984  1985  1986  1987  1988  1989  ...   \
dim.COUNTRY                                                              ...    
Afghanistan  11.0   NaN   8.0   9.0  14.0  14.0  14.0  31.0  34.0  22.0  ...    
Albania      90.0  90.0  93.0  96.0  96.0  96.0  96.0  96.0  96.0  96.0  ...    
Algeria       NaN   NaN   NaN   NaN   NaN  68.0  67.0  73.0  81.0  82.0  ...    
Andorra       NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...    
Angola        NaN   NaN   NaN  26.0  35.0  44.0  44.0  55.0  56.0  48.0  ...    

dim.YEAR     2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  
dim.COUNTRY                                                              
Afghanistan  55.0  59.0  60.0  62.0  64.0  59.0  57.0  60.0  62.0  62.0  
Albania      97.0  98.0  97.0  99.0  99.0  98.0  99.0  98.0  97.0  96.0  
Algeria      92.0  88.0  92.0  95.0  95.0  95.0  95.0  95.0  95.0  94.0  
Andorra      94.0  98.0  98.0  99.0  99.0  98.0  95.0  96.0  96.0  97.0  
Angola       71.0  61.0  57.0  72.0  64.0  72.0  66.0  60.0  55.0  49.0  

[5 rows x 37 columns]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This is AWESOME! You are truly a lifesaver - this is exactly what i was trying to do. The extra information you added on a better method to read the JSON in is also very elegant code - Thank You – kiltannen Apr 10 '18 at 11:54
  • Unfortunately the code you have provided is python 3 specific - I am limited to python 2.7. I am trying to modify using urllib2.urlopen but not having much luck :( Maybe you can suggest a method in 2.7? – kiltannen Apr 13 '18 at 11:42
  • 1
    @kiltannen - Solution from `https://stackoverflow.com/a/12965254` for python2 does not work? – jezrael Apr 13 '18 at 11:44
  • 1
    Thank you - working my way through this now. Will get back to you. Thx Heaps for all the help – kiltannen Apr 13 '18 at 11:54
  • Well that did indeed work! I should have spent a bit more time working it through. Once again - thank you for the lend of your genius skills! – kiltannen Apr 13 '18 at 12:04
  • 1
    @kiltannen - Good news, super! – jezrael Apr 13 '18 at 12:04