4

I have a function that returns latitude and longitude information. I want to create columns for these 4 variables in a data frame.

Here is my code:

import geocoder
import pandas as pd
import geolib
from geolib import geohash

df = pd.read_csv('New_DP2.csv')

key = [redacted]


fields = ['NWLat', 'NWLong', 'SELat', 'SELong']
def getData(address, key):
    g = geocoder.mapquest(address, key=key)
    lat = g.lat
    lng = g.lng
    h = geolib.geohash.encode(lat, lng, 7)
    hashes = geolib.geohash.neighbours(h)
    NW = geohash.decode(hashes.nw)
    SE = geohash.decode(hashes.ne)
    nwlat = NW.lat
    nwlon = NW.lon
    selat = SE.lat
    selon = SE.lon

I want to create four columns in a data frame that will make columns for 'nwlat','nwlon', 'selat', 'selon'.

Normally I would simply return nwlat and then create a lambda

df['NWLong'] = df.apply(lambda row: getData(row['a'], key), axis = 1)

Then I would do this for each case of the other 3 variables I want returned. But then I am running this a total of 4 times instead of just once.

cs95
  • 379,657
  • 97
  • 704
  • 746
Wolfy
  • 548
  • 2
  • 9
  • 29
  • Ok, so normally you'd use `df.apply`. What problem are you encountering in this case? – Peter Leimbigler Feb 28 '19 at 01:07
  • @PeterLeimbigler Please see edit, apologies if I was not clear. – Wolfy Feb 28 '19 at 02:51
  • Are you trying to construct bounding boxes with this data? You might want to consider using `geopandas` or a spatial database (just a suggestion, unrelated to the question here). – cs95 May 14 '19 at 20:37

1 Answers1

8

You were quite close. All you needed to do was to figure out how to return the result appropriately. Your function will need to look like this:

def getData(address, key):
    ...
    NW = geohash.decode(hashes.nw)
    SE = geohash.decode(hashes.ne)

    return pd.Series(dict(zip(fields, [NW.lat, NW.lon,  SE.lat, SE.lon]))) 

You can then use Series.apply:

df = pd.DataFrame({'address': ['Los Angeles, CA']})  # for example
df['address'].apply(getData, key=key)

                 NWLat                 NWLong                SELat                 SELong
0  34.0541839599609375  -118.2451629638671875  34.0541839599609375  -118.2424163818359375

This works by having getData return a Series object (with the fields as the index). apply will then automatically construct a DataFrame and return the result.

Side note: To concatenate these columns to the existing df, call pd.concat:

res = pd.concat([df, df['address'].apply(getData, key=key)], axis=1)

Another option would be to use a list comprehension, if there are no NaNs in your DataFrame. This is a performance (AND memory) micro-optimization.

def getData2(address, key):
    ...
    NW = geohash.decode(hashes.nw)
    SE = geohash.decode(hashes.ne)

    return [NW.lat, NW.lon,  SE.lat, SE.lon]

pd.DataFrame([getData2(a, key) for a in df['address']], columns=fields)

                 NWLat                 NWLong                SELat                 SELong
0  34.0541839599609375  -118.2451629638671875  34.0541839599609375  -118.2424163818359375

More information on List Comprehensions and their benefits have been detailed in my post: For loops with pandas - When should I care?

cs95
  • 379,657
  • 97
  • 704
  • 746