1

I am trying to create columns for the geolocation of countries from a CSV.

So far I am able to create a new column with both latitude and longitude by mapping over the columns and applying the geolocate() function.

Dataframe

geolocation
(-34.9964963, -64.9672817)

Expected output:

geolocation                   latitude     longitude
(-34.9964963, -64.9672817)  -34.9964963    -64.9672817

I am mapping over the the columns so I am not sure how to get just the latitude and longitude and make their respective columns.

def add_geolocation(df, country_column):
  df["geolocation"] = country_column.map(lambda x: geolocate(x))
  return df


add_geolocation(df=df, country_column=df["country"])

In the function geolocate() I return both of them.

def geolocate(country):
  # Location
  loc = geolocator.geocode(country, timeout=10000)
  # Latitiude
  lat = get_latitude(loc)
  # Longitude
  long_ = get_longitude(loc)
  # Address 
  add = get_address(loc)
  return lat, long_

Would it be possible for me to specify in the lambda function that I want to just use the latitude.

For example, latitude, longitude = geolocate(country) then just use the values latitude.

yudhiesh
  • 6,383
  • 3
  • 16
  • 49
  • 3
    what is the question? – balderman Oct 30 '20 at 15:29
  • Hi apologies for not wording it right. I am trying to get the ```lat``` and ```long``` while mapping over the columns using the lambda function ```country_column.map(lambda x: geolocate(x))```. – yudhiesh Oct 30 '20 at 15:31
  • 2
    And what is the problem? Do you get any error? Does the expected output differ from the actual output? – balderman Oct 30 '20 at 15:32
  • Yes it works fine but it returns both of them together. I am trying to just get the latitude then make a column with it and the same for longitude. – yudhiesh Oct 30 '20 at 15:34
  • `return lat, long_` - this is why it returns both. your code return both. see https://stackoverflow.com/questions/16236684/apply-pandas-function-to-column-to-create-multiple-new-columns gow to create multiple columns – balderman Oct 30 '20 at 15:35
  • Yes I am aware of that but I am asking if I could specify in the lambda function that I want to just use the ```latitude```. For example, ```latitude, longitude = geolocate(country)``` then just use the values ```latitude```. – yudhiesh Oct 30 '20 at 15:39
  • 1
    As a note, you don't need a lambda here at all. Just run country_column.map(geolocate) – Metropolis Oct 30 '20 at 15:42
  • I have edited the question to reflect that. – yudhiesh Oct 30 '20 at 15:49

2 Answers2

1

You could this instead of using map and a lambda function:

df["latitude"] = df["geolocations"].str[0]
df["longitude"] = df["geolocations"].str[1]

And as Ben said in comment, even shorter:

df["latitude"], df["longitude"] = df["geolocations"].str
Rivers
  • 1,783
  • 1
  • 8
  • 27
1

You can use zip to unpack the tuple inside the geolocation column:

def add_geolocation(df, country_column):
    df["geolocation"] = country_column.map(geolocate)
    df['lat'], df['long'] = zip(*df['geolocation'])
    return df

Edit: what does zip(*df['geolocation']) do? This is a combination of two different concepts: the star (*) operator and the zip function.

The * operator unpacks the collection into positional arguments. The following two calls are equivalent:

def f(a, b):
    return a + b

f(1, 2) # return 3

lst = [1,2]
f(*lst) # return 3

The zip function returns elements from the input collections pair-wise:

zip([1,2], ['A', 'B'], ['One', 'Two']) # return (1, 'A', 'One'), (2, 'B', 'Two')

We combine the two here to split the first elements of the geolocation tuples into a separate collection, and the second elements into another collection:

geolocation
(1,2)
(3,4)
(5,6)

zip(*df['geolocation']) == zip((1,2), (3,4), (5,6)) == [(1,3,5), (2,4,6)]
# The first is a collection of latitudes, second is longitudes
Code Different
  • 90,614
  • 16
  • 144
  • 163
  • Thank you that worked but I am not sure what is the ```*df['geolocation']```? Could you explain that? – yudhiesh Oct 30 '20 at 15:44