Mapping lat long to boundary of lat longs in another file in Python?

Question

I have two data frames -

df1 - columns are Order_ID, lat, long

Order_ID	Lat	Long
1	32.0455	-76.9876
2	32.5679	-77.3421
3	33.4567	-77.9876

df2 - columns are lat, long, Category

Category	Lat	Long
S1	32.0109	-76.0765
S1	32.8769	-77.5674
S1	33.1987	-78.7654
S2	33.5967	-78.0765
S2	33.8769	-79.5674
S2	34.1987	-79.7654

df1 is order level data with latitude and longitude present for each order.

df2 would have multiple lat long for each category, essentially defining a boundary in map for each category separately.

I want to map order id to category id. For example, based on the polygon of S1 or S2, order id would lie in one of the category.

How can I map the order_id in df1 to category in df2. Please help with dummy python pandas code.

please share what `df1` and `df2` look like and what you've tried so far. potential duplicate: https://stackoverflow.com/questions/48097742/geopandas-point-in-polygon — mitoRibo, Aug 31 '22 at 00:07
Please provide enough code so others can better understand or reproduce the problem. — Community, Aug 31 '22 at 01:20

Rob Raymond · Answer 1 · 2022-08-31T16:20:55.063

I have tried with your sample data. There are not enough orders such that the convex hull of the points cover any category
have simulated some data to demonstrate
1. create geopandas data frame of orders
2. create geopandas data frame of convex hull of points that make up categories
3. sjoin() two GeoDataFrames to find association you require
have provided a visualisation to better demonstrate how this works

import geopandas as gpd
import pandas as pd
import numpy as np

gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_cities"))
gdf = gdf.loc[gdf["name"].isin(["London", "Paris", "Brussels"])]
# gdf = gdf.sample(10)

# pandas dataframes structured as per question
df1 = pd.DataFrame(
    {"Long": gdf["geometry"].x, "Lat": gdf["geometry"].y, "Order_ID": gdf["name"]}
)
N = 8
df2 = pd.concat(
    [
        pd.DataFrame(
            {
                "Long": np.random.uniform(r.minx, r.maxx, N),
                "Lat": np.random.uniform(r.miny, r.maxy, N),
                "Category": np.full(N, chr(65 + _)),
            }
        )
        for _, r in gdf.reset_index()
        .to_crs(gdf.estimate_utm_crs())
        .buffer(3 * 10**5)
        .to_crs(gdf.crs)
        .bounds.iterrows()
    ]
)

# sample geometry,  not enough orders to work effectively
# df1 = pd.DataFrame(
#     **{
#         "index": [0, 1, 2],
#         "columns": ["Order_ID", "Lat", "Long"],
#         "data": [[1, 32.0455, -76.9876], [2, 32.5679, -77.3421], [3, 33.4567, -77.987]],
#     }
# )

# df2 = pd.DataFrame(
#     **{
#         "index": [0, 1, 2, 3, 4, 5],
#         "columns": ["Category", "Lat", "Long"],
#         "data": [
#             ["S1", 32.0109, -76.0765],
#             ["S1", 32.8769, -77.5674],
#             ["S1", 33.1987, -78.7654],
#             ["S2", 33.5967, -78.0765],
#             ["S2", 33.8769, -79.5674],
#             ["S2", 34.1987, -79.7654],
#         ],
#     }
# )

gdf1 = gpd.gpd.GeoDataFrame(
    df1["Order_ID"],
    geometry=gpd.points_from_xy(x=df1["Long"], y=df1["Lat"]),
    crs="epsg:4386",
)

# want convex hull of all points that make up a category
gdf2 = (
    gpd.GeoDataFrame(
        df2["Category"],
        geometry=gpd.points_from_xy(x=df2["Long"], y=df2["Lat"]),
        crs="epsg:4386",
    )
    .dissolve("Category")
    .convex_hull.reset_index()
)

# get association between order and category using geometry
gpd.sjoin(gdf1, gdf2)

	Order_ID	geometry	index_right	Category
158	Brussels	POINT (4.33137074969045 50.83526293533032)	0	A
187	London	POINT (-0.118667702475932 51.5019405883275)	1	B
199	Paris	POINT (2.33138946713035 48.86863878981461)	2	C

visualise

# visualise it...
m = gdf2.explore(height=300, width=500)
gdf1.explore(m=m, color="red")

I am getting this error when I configured this for my use case. Error - ValueError: 'right_df' should be GeoDataFrame, got . I am using the last bit of your code (sjoin) and loading the dfs as pandas df. — Shivam Bindal, Aug 31 '22 at 16:29
Thanks for sharing this but for gdf2, I am still getting this error - ValueError: 'right_df' should be GeoDataFrame, got . Somehow, it is considering gdf2 as pandas df instead of GeoDf. — Shivam Bindal, Aug 31 '22 at 16:50
error is exactly what it says .... `sjoin()` only works on geodataframes. hence reason I have created `gdf1` and `gdf2` from `df1` and `df2` respectively ... — Rob Raymond, Aug 31 '22 at 17:17

Mapping lat long to boundary of lat longs in another file in Python?

1 Answers1

visualise