I am using a FCC api to convert lat/long coordinates into block group codes:
import pandas as pd
import numpy as np
import urllib
import time
import json
# getup, getup1, and getup2 make up the url to the api
getup = 'http://data.fcc.gov/api/block/find?format=json&latitude='
getup1 = '&longitude='
getup2 = '&showall=false'
lat = ['40.7127837','34.0522342','41.8781136','29.7604267','39.9525839',
'33.4483771','29.4241219','32.715738','32.7766642','37.3382082','30.267153',
'39.768403','30.3321838','37.7749295','39.9611755','35.2270869',
'32.7554883','42.331427','31.7775757','35.1495343']
long = ['-74.0059413','-118.2436849','-87.6297982','-95.3698028','-75.1652215',
'-112.0740373','-98.4936282','-117.1610838','-96.7969879','-121.8863286',
'-97.7430608','-86.158068','-81.655651','-122.4194155','-82.9987942',
'-80.8431267','-97.3307658','-83.0457538','-106.4424559','-90.0489801']
#make lat and long in to a Pandas DataFrame
latlong = pd.DataFrame([lat,long]).transpose()
latlong.columns = ['lat','long']
new_list = []
def block(x):
for index,row in x.iterrows():
#request url and read the output
a = urllib.request.urlopen(getup + row['lat'] + getup1 + row['long'] + getup2).read()
#load json output in to a form python can understand
a1 = json.loads(a)
#append output to an empty list.
new_list.append(a1['Block']['FIPS'])
#call the function with latlong as the argument.
block(latlong)
#print the list, note: it is important that function appends to the list
print(new_list)
gives this output:
['360610031001021', '060372074001033', '170318391001104', '482011000003087',
'421010005001010', '040131141001032', '480291101002041', '060730053003011',
'481130204003064', '060855010004004', '484530011001092', '180973910003057',
'120310010001023', '060750201001001', '390490040001005', '371190001005000',
'484391233002071', '261635172001069', '481410029001001', '471570042001018']
The problem with this script is that I can only call the api once per row. It takes about 5 minutes per thousand for the script to run, which is not acceptable with 1,000,000+ entries I am planning on using this script with.
I want to use multiprocessing to parallel this function to decrease the time to run the function. I have tried to look in to the multiprocessing handbook, but have not been able to figure out how to run the function and append the output in to an empty list in parallel.
Just for reference: I am using python 3.6
Any guidance would be great!