242

What I am trying to do is extract elevation data from a google maps API along a path specified by latitude and longitude coordinates as follows:

from urllib2 import Request, urlopen
import json

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()

This gives me a data that looks like this:

elevations.splitlines()

['{',
 '   "results" : [',
 '      {',
 '         "elevation" : 243.3462677001953,',
 '         "location" : {',
 '            "lat" : 42.974049,',
 '            "lng" : -81.205203',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      },',
 '      {',
 '         "elevation" : 244.1318664550781,',
 '         "location" : {',
 '            "lat" : 42.974298,',
 '            "lng" : -81.19575500000001',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      }',
 '   ],',
 '   "status" : "OK"',
 '}']

when putting into as DataFrame here is what I get:

enter image description here

pd.read_json(elevations)

and here is what I want:

enter image description here

I'm not sure if this is possible, but mainly what I am looking for is a way to be able to put the elevation, latitude and longitude data together in a pandas dataframe (doesn't have to have fancy mutiline headers).

If any one can help or give some advice on working with this data that would be great! If you can't tell I haven't worked much with json data before...

EDIT:

This method isn't all that attractive but seems to work:

data = json.loads(elevations)
lat,lng,el = [],[],[]
for result in data['results']:
    lat.append(result[u'location'][u'lat'])
    lng.append(result[u'location'][u'lng'])
    el.append(result[u'elevation'])
df = pd.DataFrame([lat,lng,el]).T

ends up dataframe having columns latitude, longitude, elevation

enter image description here

pbreach
  • 16,049
  • 27
  • 82
  • 120

14 Answers14

302

I found a quick and easy solution to what I wanted using json_normalize() included in pandas 1.01.

from urllib2 import Request, urlopen
import json

import pandas as pd    

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
df = pd.json_normalize(data['results'])

This gives a nice flattened dataframe with the json data that I got from the Google Maps API.

gcamargo
  • 3,683
  • 4
  • 22
  • 34
pbreach
  • 16,049
  • 27
  • 82
  • 120
  • 27
    This no longer seems to work — I had to use `pd.DataFrame.from_records()` as described here http://stackoverflow.com/a/33020669/1137803 – avv Mar 05 '17 at 20:15
  • 4
    from_records also doesn't work at times if the json is sufficiently complex, you have to apply json.io.json.json_normalize to get a flatmap Check out https://stackoverflow.com/questions/39899005/how-to-flatten-a-pandas-dataframe-with-some-columns-as-json/39906235 – devssh May 28 '18 at 09:27
48

Check this snip out.

# reading the JSON data using json.load()
file = 'data.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)

Hope it helps :)

Brian Burns
  • 20,575
  • 8
  • 83
  • 77
Rishu Shrivastava
  • 3,745
  • 1
  • 20
  • 41
  • 1
    Error. You should pass the file contents (i.e. a string) to json.loads(), not the file object itself - json.load(train_file.read()) – Vasin Yuriy Nov 01 '19 at 12:12
  • If the .json file gives you problems, replace the last two lines of this snippet with this answer: https://stackoverflow.com/questions/59695056/the-error-attributeerror-list-object-has-no-attribute-values-appears-when – liorr Oct 08 '22 at 17:37
  • 1
    I came for the orient='index' option. Thank you very much. – Athachai Dec 01 '22 at 12:23
33

Optimization of the accepted answer:

The accepted answer has some functioning problems, so I want to share my code that does not rely on urllib2:

import requests
from pandas import json_normalize
url = 'https://www.energidataservice.dk/proxy/api/datastore_search?resource_id=nordpoolmarket&limit=5'

response = requests.get(url)
dictr = response.json()
recs = dictr['result']['records']
df = json_normalize(recs)
print(df)

Output:

        _id                    HourUTC               HourDK  ... ElbasAveragePriceEUR  ElbasMaxPriceEUR  ElbasMinPriceEUR
0    264028  2019-01-01T00:00:00+00:00  2019-01-01T01:00:00  ...                  NaN               NaN               NaN
1    138428  2017-09-03T15:00:00+00:00  2017-09-03T17:00:00  ...                33.28              33.4              32.0
2    138429  2017-09-03T16:00:00+00:00  2017-09-03T18:00:00  ...                35.20              35.7              34.9
3    138430  2017-09-03T17:00:00+00:00  2017-09-03T19:00:00  ...                37.50              37.8              37.3
4    138431  2017-09-03T18:00:00+00:00  2017-09-03T20:00:00  ...                39.65              42.9              35.3
..      ...                        ...                  ...  ...                  ...               ...               ...
995  139290  2017-10-09T13:00:00+00:00  2017-10-09T15:00:00  ...                38.40              38.4              38.4
996  139291  2017-10-09T14:00:00+00:00  2017-10-09T16:00:00  ...                41.90              44.3              33.9
997  139292  2017-10-09T15:00:00+00:00  2017-10-09T17:00:00  ...                46.26              49.5              41.4
998  139293  2017-10-09T16:00:00+00:00  2017-10-09T18:00:00  ...                56.22              58.5              49.1
999  139294  2017-10-09T17:00:00+00:00  2017-10-09T19:00:00  ...                56.71              65.4              42.2 

PS: API is for Danish electricity prices

DisabledWhale
  • 771
  • 1
  • 7
  • 15
  • 1
    this solution is better because you can just focus on `pandas` package without importing other packages – KY Lu Aug 08 '20 at 12:46
19

Just a new version of the accepted answer, as python3.x does not support urllib2

from requests import request
import json
from pandas.io.json import json_normalize

path1 = '42.974049,-81.205203|42.974298,-81.195755'
response=request(url='http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false', method='get')
elevations = response.json()
elevations
data = json.loads(elevations)
json_normalize(data['results'])
Abhishake Gupta
  • 2,939
  • 1
  • 25
  • 34
18

You could first import your json data in a Python dictionnary :

data = json.loads(elevations)

Then modify data on the fly :

for result in data['results']:
    result[u'lat']=result[u'location'][u'lat']
    result[u'lng']=result[u'location'][u'lng']
    del result[u'location']

Rebuild json string :

elevations = json.dumps(data)

Finally :

pd.read_json(elevations)

You can, also, probably avoid to dump data back to a string, I assume Panda can directly create a DataFrame from a dictionnary (I haven't used it since a long time :p)

Raphaël Braud
  • 1,489
  • 10
  • 14
  • I still end up with the same result using the json data and the dictionary that was created. It seems like each element in the dataframe has it's own dict. I tried using your approach in a less attractive way building a separate list for lat, lng, and elevation while iterating through 'data'. – pbreach Jan 14 '14 at 05:12
  • @user2593236 : Hello, I did an error while copy/pasting my code in SO : a del was missing (answer edited) – Raphaël Braud Jan 14 '14 at 08:37
  • Hmm.. Still the same thing where it has 'results' and 'status' as headers while the rest of the json data appear as dicts in each cell. I think the solution to this problem would be to change the format of the data so that it is not subdivided into 'results' and 'status' then the data frame will use the 'lat', 'lng', 'elevation', 'resolution' as the separate headers. Either that, or I will need to find a way to load the json data into a dataframe that will have a multilevel header index as I mentioned in the question. – pbreach Jan 14 '14 at 18:47
  • Which final table do you expect ? The one you got after your edit ? – Raphaël Braud Jan 14 '14 at 22:51
  • The one I got after my final edit does the job, basically all I needed was to get the data in a tabular format that I can export and work with – pbreach Jan 16 '14 at 16:48
12

use Json to load the file and convert it to a pandas dataframe using DataFrame.from_dict function

import json
import pandas as pd
json_string = '{ "name":"John", "age":30, "car":"None" }'

a_json = json.loads(json_string)
print(a_json)

dataframe = pd.DataFrame.from_dict(a_json)
Emeka Boris Ama
  • 429
  • 4
  • 5
8

Here is small utility class that converts JSON to DataFrame and back: Hope you find this helpful.

# -*- coding: utf-8 -*-
from pandas.io.json import json_normalize

class DFConverter:

    #Converts the input JSON to a DataFrame
    def convertToDF(self,dfJSON):
        return(json_normalize(dfJSON))

    #Converts the input DataFrame to JSON 
    def convertToJSON(self, df):
        resultJSON = df.to_json(orient='records')
        return(resultJSON)
Siva
  • 170
  • 1
  • 3
  • 11
5

The problem is that you have several columns in the data frame that contain dicts with smaller dicts inside them. Useful Json is often heavily nested. I have been writing small functions that pull the info I want out into a new column. That way I have it in the format that I want to use.

for row in range(len(data)):
    #First I load the dict (one at a time)
    n = data.loc[row,'dict_column']
    #Now I make a new column that pulls out the data that I want.
    data.loc[row,'new_column'] = n.get('key')
billmanH
  • 1,298
  • 1
  • 14
  • 25
  • Per @niltoid I may have used an older version of Pandas when I wrote this in 2014. Pandas has changed how `.loc` and `.iloc` and you may need to adjust. See that adjustment below. – billmanH May 07 '21 at 16:11
1

billmanH's solution helped me but didn't work until i switched from:

n = data.loc[row,'json_column']

to:

n = data.iloc[[row]]['json_column']

here's the rest of it, converting to a dictionary is helpful for working with json data.

import json

for row in range(len(data)):
    n = data.iloc[[row]]['json_column'].item()
    jsonDict = json.loads(n)
    if ('mykey' in jsonDict):
        display(jsonDict['mykey'])
niltoid
  • 819
  • 1
  • 7
  • 17
1
#Use the small trick to make the data json interpret-able
#Since your data is not directly interpreted by json.loads()

>>> import json
>>> f=open("sampledata.txt","r+")
>>> data = f.read()
>>> for x in data.split("\n"):
...     strlist = "["+x+"]"
...     datalist=json.loads(strlist)
...     for y in datalist:
...             print(type(y))
...             print(y)
...
...
<type 'dict'>
{u'0': [[10.8, 36.0], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'1': [[10.8, 36.1], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'2': [[10.8, 36.2], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'3': [[10.8, 36.300000000000004], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'4': [[10.8, 36.4], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'5': [[10.8, 36.5], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'6': [[10.8, 36.6], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'7': [[10.8, 36.7], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'8': [[10.8, 36.800000000000004], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'9': [[10.8, 36.9], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}


1

Once you have the flattened DataFrame obtained by the accepted answer, you can make the columns a MultiIndex ("fancy multiline header") like this:

df.columns = pd.MultiIndex.from_tuples([tuple(c.split('.')) for c in df.columns])
loganbvh
  • 11
  • 2
1

I prefer a more generic method in which may be user doesn't prefer to give key 'results'. You can still flatten it by using a recursive approach of finding key having nested data or if you have key but your JSON is very nested. It is something like:

from pandas import json_normalize

def findnestedlist(js):
    for i in js.keys():
        if isinstance(js[i],list):
            return js[i]
    for v in js.values():
        if isinstance(v,dict):
            return check_list(v)


def recursive_lookup(k, d):
    if k in d:
        return d[k]
    for v in d.values():
        if isinstance(v, dict):
            return recursive_lookup(k, v)
    return None

def flat_json(content,key):
    nested_list = []
    js = json.loads(content)
    if key is None or key == '':
        nested_list = findnestedlist(js)
    else:
        nested_list = recursive_lookup(key, js)
    return json_normalize(nested_list,sep="_")

key = "results" # If you don't have it, give it None

csv_data = flat_json(your_json_string,root_key)
print(csv_data)
MD Rijwan
  • 471
  • 6
  • 15
0

Rumble supports JSON natively with JSONiq and runs on Spark, managing DataFrames internally so you don't need to -- even if the data isn't fully structured:

let $coords := "42.974049,-81.205203%7C42.974298,-81.195755"
let $request := json-doc("http://maps.googleapis.com/maps/api/elevation/json?locations="||$coords||"&sensor=false")
for $obj in $request.results[]
return {
  "latitude" : $obj.location.lat,
  "longitude" : $obj.location.lng,
  "elevation" : $obj.elevation
}

The results can be exported to CSV and then reopened in any other host language as a DataFrame.

Ghislain Fourny
  • 6,971
  • 1
  • 30
  • 37
0

Refer MongoDB documentation, I got the following code:

from pandas import DataFrame
df = DataFrame('Your json string')
vincent
  • 625
  • 1
  • 8
  • 20