0

I downloaded the location history json from google maps data and wanted to put all the available content into a pandas dataframe.

df['locations'][5] yields the following:

{'timestampMs': '1540084102574',
 'latitudeE7': 327160442,
 'longitudeE7': -1171687098,
 'accuracy': 17,
 'altitude': -13,
 'verticalAccuracy': 3,
 'activity': [{'timestampMs': '1540083982124',
   'activity': [{'type': 'STILL', 'confidence': 100}]}]}

I'm able to map the timestampMs, latitude, and longitude with no problem using:

df['lat'] = df['locations'].map(lambda x: x['latitudeE7'])/10.**7
df['long'] = df['locations'].map(lambda x: x['longitudeE7'])/10.**7 
df['ts_ms'] = df['locations'].map(lambda x: x['timestampMs']).astype(float)/1000

but cannot do that for altitude or vertical accuracy as it returns a "KeyError"

Also within the activity there is a nested structure. How would I map those to the data frame as well?

David Kim
  • 1
  • 3
  • What is the exact code that throws the `KeyError`? For the nested structure, look into `json_normalize`: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html – Peter Leimbigler Nov 11 '18 at 20:47

1 Answers1

0

I've tried to reproduce your problem as follows:

sample = {
    'timestampMs': '1540084102574',
    'latitudeE7': 327160442,
    'longitudeE7': -1171687098,
    'accuracy': 17,
    'altitude': -13,
    'verticalAccuracy': 3,
    'activity': [{
        'timestampMs': '1540083982124',
        'activity': [{
            'type': 'STILL',
            'confidence': 100
            }]
        }]
}

# Creating an empty `DataFrame` of length one
df = pd.DataFrame([None],columns=['locations'])

# Passing your sample dictionary as its only value
df['locations'][0] = sample

Now, both altitute and verticalAccuracy work fine for me, as these are both keys in the outer dictionary.

df['altitude'] = df['locations'].map(lambda x: x['altitude'])
df['verticalAccuracy'] = df['locations'].map(lambda x: x['verticalAccuracy'])

For the nested items, note that activity is a list of length one.

type(sample.get('activity'))  # returns `list`
len(sample.get('activity'))  # returns 1

So you'll need to index the first (in this case, index number zero) item of the list. That item will in turn be a Python dictionary, which will need to be accessed through bracket notation or the safer .get() method.

df['timestampMs'] = df['locations'].map(lambda x: x['activity'][0].get('timestampMs'))

You can apply the sample logic to the inner activity dictionary key nested inside the outer one.

df['type'] = df['locations'].map(lambda x: x['activity'][0].get('activity')[0].get('type'))
df['confidence'] = df['locations'].map(lambda x: x['activity'][0].get('activity')[0].get('confidence'))
dmitriys
  • 307
  • 4
  • 16
  • dmiytriys! The first portion of the code to pull the altitude and vertical accuracy worked great. By any chance what would I have to look up to get a better understanding of why my initial approach didn't work? – David Kim Nov 12 '18 at 17:48