I have a dataframe that looks like this (it represents zones in a 2D space; note that they overlap, and that is OK):
>>> zones = pd.DataFrame(dict(
minx=[-10, -10, -5],
maxx=[10, 10, 5],
miny=[-10, 0, 0],
maxy=[10, 10, 10],
), index=range(1,4))
>>> zones.index.name = "zone"
>>> zones
minx maxx miny maxy
zone
1 -10 10 -10 10
2 -10 10 0 10
3 -5 5 0 10
I have a second dataframe of ordered pairs that looks something like the following (random numbers here since they don't really matter):
>>> pairs = pd.DataFrame(list(zip((uniform(0, 10) for _ in range(10)), (uniform(0,10) for _ in range(10)))), index=range(1,11), columns=["cx", "cy"])
>>> pairs.index.name = "pair"
>>> pairs["zone"] = "??"
>>> pairs
cx cy zone
pair
1 8.405715 2.691102 ??
2 6.645482 1.843225 ??
3 4.123719 8.996641 ??
4 7.003991 9.695182 ??
5 7.296730 1.175356 ??
6 7.960617 9.503888 ??
7 7.694749 6.907869 ??
8 8.308742 5.439141 ??
9 6.404875 5.663983 ??
10 3.361129 3.123590 ??
I want to fill the "zone" series of this dataframe with the correct zone number for each cx, cy pair based on the zone definitions in the first dataframe.
The code I have written to do this is below. However, I am sure there is a much better way to do it using pandas (i.e., without iterating over the zones
columns). How should it be done?
for num, zone in zones.transpose().iteritems():
idx = (
(pairs.cx.gt(zone["minx"]))
& (pairs.cx.lt(zone["maxx"]))
& (pairs.cy.gt(zone["miny"]))
& (pairs.cy.lt(zone["maxy"]))
)
pairs.loc[idx, "zone"] = num
NOTE: The highest zone number wins. So for example, index 5 in the second table above has the approximate ordered pair (7.3, 1.2), and would fall inside of zones 1, 2, AND 3. Therefore, it should be zone 3. However, index 9 with a.o.p. (6.4, 5.7) falls outside of zone 3 but inside zones 1 AND 2. Therefore, it should be assigned zone 2.