-2

I currently have a dataset which looks something like this

enter image description here

dataframe:

data = {'Line Item': ["India_Tamil_display 5", "India_Tamil_display 5","Indonesia_Arabic_display 1","Indonesia_Arabic_display 1","Indonesia_Arabic_display 1"],
        'Region': ["Puducherry", "Tamil Nadu", "Banten,Indonesia", "Central Java","East Java"],
        'Impressions' :[43,56,23,56,98],
        'Reach' : [32,45,12,43,76]
        }

I have been asked to visualize the impressions/reach/video views on a map in Python. This is my first time visualizing maps in Python and I have no clue how to get it done using just Country name and Region name. Have been searching online for hours but none of the solutions are making sense. It's a small assignment so I doubt it is something hectic like getting latitudes and longitudes first. Any help will be appreciated. Thanks

Edit: I can get the lat and long of individual point given but not sure how to pass whole column and get the desired results

from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
geolocator = Nominatim(user_agent="myGeocoder")
location = geolocator.geocode("West Java")
print(location.address)
print((location.latitude, location.longitude))
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
hyeri
  • 663
  • 9
  • 26

2 Answers2

1

Imports

import pandas as pd
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import matplotlib.pyplot as plt
import os
# path to proj4-5.2.0-ha925a31_1 for Anaconda distribution
# without this line KeyError: 'PROJ_LIB' may occur when importing Basemap
os.environ['PROJ_LIB'] = r'E:\Anaconda3\pkgs\proj4-5.2.0-ha925a31_1\Library\share'
from mpl_toolkits.basemap import Basemap

Data

data = {'Region': ["Puducherry", "Tamil Nadu", "Banten,Indonesia", "Central Java", "East Java"],
        'Impressions' :[43,56,23,56,98],
        'Reach' : [32,45,12,43,76]}

df = pd.DataFrame(data)

geolocator = Nominatim(user_agent="myGeocoder")

def geo_location(region: str):
    location = geolocator.geocode(region)
    return pd.Series([location.latitude, location.longitude])

df[['lat', 'long']] = df['Region'].apply(geo_location)

# df.head()

           Region  Impressions  Reach        lat        long
       Puducherry           43     32  11.934057   79.830645
       Tamil Nadu           56     45  10.909433   78.366535
 Banten,Indonesia           23     12  -6.478003  105.541028
     Central Java           56     43  -5.625965  110.371649
        East Java           98     76  -7.697740  112.491420

Plot

# Set the dimension of the figure
my_dpi=96
plt.figure(figsize=(2600/my_dpi, 1800/my_dpi), dpi=my_dpi)

# Make the background map
# m=Basemap(llcrnrlon=-180, llcrnrlat=-65, urcrnrlon=180, urcrnrlat=80)  # full map
m=Basemap(llcrnrlon=60, llcrnrlat=-15, urcrnrlon=155, urcrnrlat=40)  # SE Asia
m.drawmapboundary(fill_color='#A6CAE0', linewidth=0)
m.fillcontinents(color='grey', alpha=0.3)
m.drawcoastlines(linewidth=0.1, color="white")

# Add a point per position
m.scatter(df['long'], df['lat'], s=df['Impressions'], alpha=0.4, cmap="Set1")

plt.show()

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • Hi it works well for small dataset but for my original data it give GeocoderUnavailable: Service not available error – hyeri Apr 20 '20 at 18:54
  • I have added the following line to make it and add some delay but the solution isn't working \geocode = RateLimiter(geolocator.geocode, min_delay_seconds=2)\ – hyeri Apr 20 '20 at 19:05
  • @hyeri That's a separate issue with their service. Using Nominatim with the default user_agent is strongly discouraged, as it violates Nominatim’s Usage Policy https://operations.osmfoundation.org/policies/nominatim/ and may possibly cause 403 and 429 HTTP errors. Please make sure to specify a custom user_agent with Nominatim(user_agent="my-application") or by overriding the default user_agent: geopy.geocoders.options.default_user_agent = "my-application". In geopy 2.0 an exception will be thrown when a custom user_agent is not specified. – Trenton McKinney Apr 20 '20 at 19:16
  • @hyeri you will probably need to select a different geocoder https://geopy.readthedocs.io/en/stable/#module-geopy.geocoders and many of them charge for large volumes of data. – Trenton McKinney Apr 20 '20 at 19:23
  • https://stackoverflow.com/questions/58439692/convert-physical-addresses-to-geographic-locations-latitude-and-longitude/58441019#58441019 – Trenton McKinney Apr 20 '20 at 19:40
  • Is there a way to visualize this map without finding lat and long / – hyeri Apr 20 '20 at 19:59
  • All the services ask for billing – hyeri Apr 20 '20 at 20:00
  • Instead of feeding the DF region column to the function you should extract only the unique values from Region and find lat/long for the unique values and then save that information to a file, so you're not constantly looking up the same value. You can fill in the DF from a dict with the lat & long. That may limit the number of calls you need to make. I don't know another way of visualizing the data w/o lat & long. – Trenton McKinney Apr 20 '20 at 21:28
0

You can use Geopandas, that compile a lot of useful geotools:

import geopandas as gp
import geopy
import contextily as ctx

First create a geodataframe with your data:

data = {'Line Item': ["India_Tamil_display 5", "India_Tamil_display 5","Indonesia_Arabic_display 1","Indonesia_Arabic_display 1","Indonesia_Arabic_display 1"],
    'Region': ["Puducherry", "Tamil Nadu", "Banten,Indonesia", "Central Java","East Java"],
    'Impressions' :[43,56,23,56,98],
    'Reach' : [32,45,12,43,76]
    }

dfg = gp.GeoDataFrame(data)

Add to the GeoDataFrame a geometry column using the internal geocoder tool of Geopandas:

dfg['geometry'] = gp.tools.geocode(dfg.Region, provider='nominatim', user_agent="add-your-app-name-here").geometry 
dfg.crs = "EPSG:4326"
dfg.head()

dfg.head()

Finally plot the map using contextily to add a basemap:

dfg = dfg.to_crs(epsg=3857)
ax = dfg.plot(figsize=(16, 10), alpha=0.75, edgecolor='k', marker='o', color='red', markersize=dfg.Reach*5)
ctx.add_basemap(ax)
ax.set_axis_off()

Reach Map

Feel free to play with Geopandas to discover how to add layers and other aesthetics fot your map.

For large datasets you can use Geopy RateLimiter to set a query delay in seconds.

hyances
  • 13
  • 3