This is a follow up question to Geospatial Analytics in Python
The question is similar to https://gis.stackexchange.com/questions/84114/shapely-unable-to-tell-if-polygon-contains-point but reversing lat long seemed to have solved the issue, but it isn't helping me.
The Uber data is given in lat/long which is pretty straight foward, reverse look up gives the address when used in geopandas
The issue however is to look up those lat/long in shapefile. The Uber data looks like:
Trip ID DateTime Stamp Lat Long
00001 2007-01-07T10:56:46+00:00 37.786117 -122.440119
00001 2007-01-07T10:56:50+00:00 37.786564 -122.440209
00001 2007-01-07T10:56:54+00:00 37.786905 -122.440270
00001 2007-01-07T10:56:58+00:00 37.786956 -122.440279
00002 2007-01-06T06:22:35+00:00 37.800224 -122.433520
00002 2007-01-06T06:22:39+00:00 37.800155 -122.434101
00002 2007-01-06T06:22:43+00:00 37.800160 -122.434430
00002 2007-01-06T06:22:47+00:00 37.800378 -122.434527
00002 2007-01-06T06:22:51+00:00 37.800738 -122.434598
00002 2007-01-06T06:22:55+00:00 37.800938 -122.434650
00002 2007-01-06T06:22:59+00:00 37.801024 -122.434889
00002 2007-01-06T06:23:03+00:00 37.800955 -122.435392
00002 2007-01-06T06:23:07+00:00 37.800886 -122.435959
00002 2007-01-06T06:23:11+00:00 37.800811 -122.436275
The shape file polygon boundry looks like
(5979385.645656899, 2110931.7279282957, 5988491.7629433125, 2116394.4427246302)
(5996757.772329897, 2104615.921334222, 6002126.622484565, 2111141.524096638)
(5994970.50687556, 2086244.426253125, 6004106.84030889, 2096245.441356048)
(6005060.663860559, 2117913.4127838016, 6010794.38500464, 2123410.4359104633)
(5999414.325087652, 2098231.5748509616, 6005330.746325642, 2103724.0536953807)
(5990180.636205971, 2101104.2121503055, 5997586.527562141, 2107405.9502029717)
(6005605.349122897, 2109599.6380728036, 6010954.164540723, 2115863.756778136)
(5997399.803198054, 2095859.3430468887, 6002045.244038388, 2100357.5978298783)
(6018974.499877974, 2121660.499777794, 6024740.999827892, 2131294.0001958013)
(5980891.2469763905, 2086337.3158311248, 5992333.58203131, 2097376.2589762956)
(5979838.815354228, 2109536.4948263764, 5990061.512428477, 2115435.3563882113)
(5996370.188459396, 2086085.1349050552, 6006040.649761483, 2089160.6310506314)
(6000325.210404977, 2087887.1444243789, 6011873.615785807, 2095773.4459089637)
(5980631.069675222, 2095815.8703648, 5992293.742215976, 2101164.5775151253)
(6010609.867329061, 2112785.889902383, 6015766.567317471, 2119365.8508238047)
(5991138.3905240595, 2086268.6489737183, 5998688.01650089, 2094657.981276378)
(6004790.221816152, 2100493.380038634, 6011576.786655068, 2109303.3404370546)
(5991505.183097556, 2091674.2884248793, 6000205.414384723, 2102574.580600634)
So the point in polygon/polygon.contains(point) is not working. Looking at the data, the lat long is very small compared the shapely file, I am not sure if I have to convert one unit to another, it looks like totally different metric systems :) Below is the code:
import fiona
import shapely
from shapely.geometry import Point
import geopy
from geopy.geocoders import Nominatim
from shapely.geometry import shape
fc = fiona.open('/home/user/geo/sfo_shapefile/planning_neighborhoods.shp')
print fc.schema
pol = fc.next()
for f in fc:
print shape(f['geometry']).bounds
geom = shape(pol['geometry'])
print "Bigger poly shape" ,shape(pol['geometry']).bounds
geolocator = Nominatim()
for cords in open('/home/user/geo/uber/trips.tsv'):
latlong = cords.split('\t')
p = Point(float(latlong[3]),float(latlong[2]))
p = Point(float(37.783383),float(-122.439594))
if geom.contains(p):
print geolocator.reverse(p).address
Links to Uber data and SFO shapefile are here http://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/#comment-606532