2

I have a two data frames with lat-lon values and some additional information I want to merge two dataFrames in range of values

df1 

lat1 lon1 lat2 lon2 a1 a2 a2
1.0   1.0  4.0  7.0 a  d  p
2.0   2.0  5.0  8.0 b  e  q
3.0   3.0  6.0  9.0 c  f  r

This is of size 30 million rows

df2

lat  lon  x 
1.0  1.0  m
1.0  2.0  n
2.0  3.0  o

This is of size 20 million rows

Now for each row in df1 i want to merge with df2 having all the lat in df2 between lat1 and lat2 and lon in lon1 and lon2

lat1<=lat<=lat2
lon1<=lon<=lon2

I have tried
Best way to join / merge by range in pandas
how to perform an inner or outer join of DataFrames with Pandas on non-simplistic criterion

But still i run out of memory.
What is the most optimal way to do this?

Bimal Gangawal
  • 385
  • 2
  • 17
  • Try Geopandas spacial joins. https://geopandas.org/mergingdata.html?highlight=merge#spatial-joins – Subbu VidyaSekar May 06 '20 at 06:10
  • @BimalGangawal have a look at this link - [Fastest way to merge pandas dataframe on ranges](https://stackoverflow.com/questions/46179362/fastest-way-to-merge-pandas-dataframe-on-ranges/46179615) – tidakdiinginkan May 06 '20 at 08:54
  • @tidakdiinginkan Thanks for a good lead but it seems not working due to this https://github.com/pandas-dev/pandas/issues/20369 – Bimal Gangawal May 06 '20 at 10:12

1 Answers1

0
import geopandas as gp
from shapely.geometry import Polygon, Point
df2_geo = gp.GeoDataFrame(df2, geometry=gp.points_from_xy(df2.latitude, df2.longitude))

def make_polygon(a,b,c,d):
    lat_list = [a,a,c,c]
    lon_list = [b,d,d,b]
    return Polygon(zip(lat_list, lon_list))

df1['geometry'] = df1.apply(lambda x: make_polygon(x['_from_latitude'], x['_from_longitude'], x['_to_latitude'],x['_to_longitude']),axis=1)              
df1_gep = gp.GeoDataFrame(df1)
result = geopandas.sjoin(df2_geo, df1_gep, how="inner", op='intersects')

@SubbuVidyaSekar Thanks for suggestion
This works perfectly fine

Bimal Gangawal
  • 385
  • 2
  • 17