Group lat/lon , if they occur witin a bounding box

Question

I have a dataframe as follows:

I created a bounding box of 10km by 10km.

def create_space(lat, lon, s=10):
    n = (180/math.pi)*(500/6378137)*10
    return lat - n, lon - n, lat + n, lon + n

Now, I would like to see if any two or more rows in my dataframe (lat and lon) are within the bounding box. If any lat and lon fall within the bounding box, I would like to add the occurrence. for eg, if index[9] fall on the bounding box of index[0], the occurrence would be 6495+23 and index[9] would be deleted.

I got this:

I tried

step =0.1
to_bin = lambda x: np.floor(x / step) * step
df["latbin"] = df.lat.map(to_bin)
df["lonbin"] = df.lon.map(to_bin)
#groups = df.groupby(("latbin", "lonbin"))

But it didnot solve my problem and I dont know how to move further.

Does this answer your question? [Is a lat lon within a bounding box?](https://stackoverflow.com/questions/21190923/is-a-lat-lon-within-a-bounding-box) — Thekingis007, Nov 15 '21 at 14:38
Yes it partially does. But I want to check if any lon and lat is inside the the bounding box and if it is then add the occurrence and merge the row. — Nickey, Nov 15 '21 at 23:28
Hi Nicky - welcome to stackoverflow! [Please do not upload images of code/errors when asking a question.](//meta.stackoverflow.com/q/285551) - images cannot be indexed/searched, can't be read by screen readers, are prone to breaking, etc. Instead, use a formatted code block when showing your inputs/outputs. Thanks! — Michael Delgado, Nov 16 '21 at 23:14
Also, can you describe why it didn't solve your problem? if it caused an error, please provide a [traceback](//realpython.com/python-traceback/). The groupby approach looks like the right one so it would help us to know more about what's going wrong. — Michael Delgado, Nov 16 '21 at 23:23

score 1 · Answer 1 · answered Nov 17 '21 at 01:43

My solution is to create a geopandas.geoseries.GeoSeries which represents your bounding box on map. Then there are existing tools in Python to test if a point is inside it or not.

But since I don't have your data, here I just use the simplest example to show you how my codes work.

# import packages
import geopandas as gpd
from shapely.ops import cascaded_union
from shapely import geometry

# create a function to build the "geopandas.geoseries.GeoSeries" for your bounding box
def create_boundingbox(p1,p2,p3,p4): 
    '''provide four corner points as (lon,lat), 
       the order is bottom-left(p1), bottom-right(p2), top-right(p3),top-left(p4) on a map'''
    p1 = geometry.Point(p1)
    p2 = geometry.Point(p2)
    p3 = geometry.Point(p3)
    p4 = geometry.Point(p4)
    pointList = [p1, p2, p3, p4, p1]
    boundingbox = geometry.Polygon([[p.x, p.y] for p in pointList])
    boundingbox = gpd.GeoSeries(cascaded_union(boundingbox))
    return boundingbox

# use some simple points as an example

# construct your box
p1 = (0,0)
p2 = (1,0)
p3 = (1,1)
p4 = (0,1)

box1 = create_boundingbox(p1,p2,p3,p4)

# now test if these points are inside or not
p5 = geometry.Point(0.5,0.5)
p6 = geometry.Point(15,15)

print(box1.contains(p5)) # this is True
print(box1.contains(p6)) # this is False

ramzeek · Answer 2 · 2021-11-17T15:19:19.530

Note that when using lon-lat, your are implying a globe of some sort, and that a set of points describes two possible spherical polygons on the surface of the spehre. Here I'm assuming you want the smaller one. If you order points in a counter-clockwise direction then you can use the approach described here using linear algebra.

This takes into account conditions where you are circumnavigating a pole or an antimeridian, and looks at the great circles connecting points (so the line connecting (000, 45) and (100, 45) does not follow the 45 degree parallel. So, think carefully about what you mean by in a square as squares (the way we normally think about them) do not drape well on a sphere.

Depending on your application, Jeremy's answer might be sufficient, but in some cases it might not be.

The approach I'm suggesting below is based on the answer to a question that I asked here, which explains the math of what I am implementing here.

First you will need to convert your points to vectors (you can use a unit sphere)

import numpy as np
def deg2rad(theta):
    return(theta * np.pi / 180)
def get_xyz(p):
    lon = deg2rad(p[0])
    lat = deg2rad(p[1])
        
    return(np.array([np.cos(lat) * np.cos(lon),
                     np.cos(lat) * np.sin(lon),
                     np.sin(lat)]))

So for your four points create four vectors describing the location in cartesian coordinates. For example,

p1 = [ 170, -10]
p2 = [-170, -10]
p3 = [-170, 10]
p4 = [ 170, 10]
v1 = get_xyz(p1)
v2 = get_xyz(p2)
v3 = get_xyz(p3)
v4 = get_xyz(p4)

Each side of the polygon is a segment of a great circle and the vector normal to the plane is n1 = np.cross(v1, v2), etc:

n1 = np.cross(v1, v2)
n2 = np.cross(v2, v3)
n3 = np.cross(v3, v4)
n4 = np.cross(v4, v1)

If some point v5 is inside the polygon described by v1, v2, v3, and v4, then the dot product of v5 with each of the n-vectors will be greater than 1. (If it is on the edge of one of the sides then it will be equal to 1.)

p5 = [180, 0]
v5 = get_xyz(p5)

in_poly = (np.dot(v5, n1)) > 0 and \
    (np.dot(v5, n2)) > 0 and \
    (np.dot(v5, n3)) > 0 and \
    (np.dot(v5, n4)) > 0

print(in_poly) # True

p5 = [180, 20]
v5 = get_xyz(p5) 

in_poly = (np.dot(v5, n1)) > 0 and \
    (np.dot(v5, n2)) > 0 and \
    (np.dot(v5, n3)) > 0 and \
    (np.dot(v5, n4)) > 0

print(in_poly) # False

EDIT:

I realized this morning that I didn't explain how to bin the data. I am putting it in a dictionary here but you can make a column in a dataframe to do something similar. Also, not that my grid boxes need to be convex (no pacman shapes) and the corners of your grid point need to be provided in a counter-clockwise order.

def in_poly(poly, point):
   v_vec = [get_xyz(p) for p in poly]
   n_vec = [np.cross(v_vec[i], v_vec[(i+1)%len(v_vec)]) for i in range(len(v_vec))]
   v_p = get_xyz(point)
   dot_prod = [np.dot(n, v_p) for n in n_vec]
   if all(d > 0 for d in dot_prod):
      return True
   else:
      return False

p1 = [ 170, -10]
p2 = [-170, -10]
p3 = [-170,  10]
p4 = [ 170,  10]
p5 = [ 150, -10]
p6 = [ 150,  10]
P = {1: [p1, p2, p3, p4],
     2: [p1, p4, p6, p5]}
InBox = {1: 0, 2: 0}

Npts = 100
lonlim = [150, 190]
latlim = [-20, 20]
points = np.stack([np.random.randint(lonlim[0], lonlim[1], Npts),
                   np.random.randint(latlim[0], latlim[1], Npts)])


for i in range(Npts):
   for key in InBox.keys():
      if in_poly(P[key], points[:,i]):
         InBox[key] += 1

print(InBox) 
#{1: 25, 2: 22}

Group lat/lon , if they occur witin a bounding box

2 Answers2