Pandas iterate over two dataframes

Question

i have two dataframes , one has 932060 coordinate point datas and the other has 13205 rectangular polygon data.

      PolygonName                                 PolygonCoordinates
0           JZ221  16.509907001328 42.942029002482 16.51009100175...
1           JZ222  16.510091001752 42.960758994106 16.51027300282...
2           JZ248  16.527602997503 42.904377009196 16.52778999695...
3           JZ249  16.527789996959 42.92310700082 16.527975996737...
4           JZ250  16.527975996737 42.941837994914 16.52815999716...
...           ...                                                ...
13200    NB484625  31.663816002416 38.701211008476 31.66485300095...
13201    NB484781  31.677563999715 38.616109998867 31.67861600195...
13202    NB484782  31.678616001952 38.637080008588 31.67966399693...
13203    NB484783  31.679663996936 38.658051002215 31.68070900143...
13204    NB484784  31.680709001432 38.679022998312 31.68175099835...

[13205 rows x 2 columns]


        point_no   Latitude  Longitude
0              1  24.673719  46.708474
1              2  24.673720  46.708474
2              3  24.673722  46.708474
3              4  24.673723  46.708474
4              5  24.673724  46.708474
...          ...        ...        ...
932055    932056  24.818875  46.618623
932056    932057  24.818889  46.618653
932057    932058  24.818904  46.618690
932058    932059  24.818919  46.618728
932059    932060  24.818932  46.618768

[932060 rows x 3 columns]

i want to iterate over those two dataframes and append a new PolygonName column at points dataframe that indicates whether this point contained by which polygon in polygons dataframe:

from shapely.geometry import Polygon,Point
import pandas as pd

polygons = pd.read_excel("polygons.xlsx")
points = pd.read_csv("points.csv")

for polygon_index , polygon_row in polygons.iterrows():
    polyString = polygon_row["PolygonCoordinates"]
    polyList = polyString.split(" ")

    polygonPoint1 = (float(polyList[0]) , float(polyList[1]))
    polygonPoint2 = (float(polyList[2]) , float(polyList[3]))
    polygonPoint3 = (float(polyList[4]) , float(polyList[5]))
    polygonPoint4 = (float(polyList[6]) , float(polyList[7]))
    #create shapely Polygon object from coordinates
    polygon = Polygon([ polygonPoint1 , polygonPoint2 , polygonPoint3 , polygonPoint4 , polygonPoint1 ])

    for point_index , point_row in points.iterrows():
        #create shapely Point object from Latitude and Longitude
        Point_X = float(point_row["Longitude"])
        Point_Y = float(point_row["Latitude"])
        point = Point(Point_Y, Point_X)
        #check if polygon contains the point
        if polygon.contains(point):
            points.loc[point_index , "PolygonName"] = polygon_row["PolygonName"]

print(points)

the output is should be like below:

        point_no   Latitude  Longitude PolygonName
0              1  24.673719  46.708474    RH275435
1              2  24.673720  46.708474    RH275435
2              3  24.673722  46.708474    RH275435
3              4  24.673723  46.708474    RH275435
4              5  24.673724  46.708474    RH275435
...          ...        ...        ...         ...
932055    932056  24.818875  46.618623    JZ249
932056    932057  24.818889  46.618653    JZ249
932057    932058  24.818904  46.618690    JZ249
932058    932059  24.818919  46.618728    JZ241
932059    932060  24.818932  46.618768    JZ242

this works fine for low number of points but when point count raises , it takes too much time to calculate because of complexity. How can i effectively solve this issue?

Please provide a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) including a small sample dataframe and the expected output, following [these guidelines](https://stackoverflow.com/a/20159305/15873043). — fsimonjetz, May 26 '22 at 12:17
Your problem is that you are looping over the points dataframe for EVERY ROW of the Polygon dataframe. Instead, you should loop once over the polygon dataframe, creating a dictionary as you go, then loop once over the points dataframe, using the dictionary to add polygons to your points. — Zorgoth, May 26 '22 at 12:28

Zorgoth · Answer 1 · 2022-05-26T13:05:07.533

This code might have bugs, but this is a method to follow. Your problem, as I mentioned in my comment, is the nested loop. You can eliminate the nested loop by creating a dictionary of points to polygons in the first loop over Polygons.

from shapely.geometry import Polygon,Point
import pandas as pd

polygons = pd.read_excel("polygons.xlsx")
points = pd.read_csv("points.csv")
p2P = {}
for polygon_index , polygon_row in polygons.iterrows():
    polyString = polygon_row["PolygonCoordinates"]
    polyList = polyString.split(" ")

    point1 = (float(polyList[0]) , float(polyList[1]))
    point2 = (float(polyList[2]) , float(polyList[3]))
    point3 = (float(polyList[4]) , float(polyList[5]))
    point4 = (float(polyList[6]) , float(polyList[7]))
    #create shapely Polygon object from coordinates
    polygon = Polygon([ point1 , point2 , point3 , point4 , point1 ])
    pointlist = [point1, point2, point3, point4]
    for point in pointlist:
        if point in p2P:
            p2P[point].add(polygonRow["PolygonName"])
        else:
            p2P[point] = {polygonRow["PolygonName"]}
polygonColumn = []
for point_index , point_row in points.iterrows():
        #create shapely Point object from Latitude and Longitude
    Point_X = float(point_row["Longitude"])
    Point_Y = float(point_row["Latitude"])
    Polygon_names = p2P.get((Point_X, Point_Y), set())
    polygonColumn.append(Polygon_names)
points['PolygonNames'] = polygonColumn

i couldnt understand how you checked if a point inside a polygon? — M.SEL, May 26 '22 at 12:51
I created a dictionary p2P, and p2P[(x,y)] is defined so that it is equal to the set of names of Polygons containing that point. — Zorgoth, May 26 '22 at 12:52
i corrected bugs for your solution and runt it but empty dictionaries returned on the PolygonNames column — M.SEL, May 26 '22 at 13:06
Probably a problem with floating points. Try rounding them to a couple decimal points. — Zorgoth, May 26 '22 at 13:20

Pandas iterate over two dataframes

1 Answers1