2

I have a geopandas dataframe that contains a complex area and some Point objects representing latitude and longitue inside that same area, both reading from .kml and .xlsx files, not defined by me. Thing is that those points are really close to each other, and when ploting the whole map they overlap, making it really difficult to spot each one individually, specially when using geoplot.kdeplot()

So what I would like to do is to find some way to equally space them, respecting the boundaries of my area. For sake of simplicity, consder the Polygon object as the area and the Point objects as the ones to resample:

import random
import matplotlib.pyplot as plt

y = np.array([[100,100], [200,100], [200,200], [100,200], [50.0,150]])
p = Polygon(y)

points = []
for i in range(100):
    pt = Point(random.randint(50,199),random.randint(100,199))
    if p.contains(pt):
        points.append(pt)

xs = [point.x for point in points]
ys = [point.y for point in points]

fig = plt.figure()
ax1=fig.add_subplot(111)
x,y = p.exterior.xy
plt.plot(x,y)
ax1.scatter(xs, ys)
plt.show()

That gives something like this:

enter image description here

Any ideas on how to do it witout crossing the area? Thank you in advance!

EDIT:

Important to mention that the resample should not be arbitrary, meaning that a point in coordinates (100,100), for example, should be somewhere near its original location.

Pedro de Sá
  • 760
  • 6
  • 19
  • Do you want to distribute points as how each point have the same distance value to its nearest neighbors and this value be constant between each pair of nearest neighbors?? – Ali_Sh May 17 '22 at 02:59
  • I think that could work. I would like to all points to be "visible", meaning that the points that are really close to each other, wouldn't be as close but near each other, respecting this distance. Maybe a fixed distance along X and Y coords could work, if I understood your suggestion – Pedro de Sá May 17 '22 at 16:38
  • 1
    You can do it by various methods. Iterations may be very easy. see [lloyd](https://github.com/duhaime/lloyd/blob/master/plots/introduction.gif) too. I meant from my ex comment uniform distribution (like: [1](https://stackoverflow.com/a/62047023/13394817), [2](https://math.stackexchange.com/a/832448/1012144)). It seems you want just move the points that are in a specified distance of each other (neighbors with very small distances) – Ali_Sh May 17 '22 at 17:17
  • 1
    How many points (maximum number) we may have. what is the magnitude of the diameter, I mean, if two points have the same coordinate how much long they must get away from each other? – Ali_Sh May 17 '22 at 17:54
  • In my real case (the one from the question is just an example due to confidenciality of informations) we have 329 inside a complex polygon. In reallity, we have group of 3 or 4 points that are real real close to each other but these clusters are considerably away from the other groups. I can't precise exactly how far apart but I imagined that an equally spaced distance should be a good start, at least – Pedro de Sá May 17 '22 at 18:08
  • And also, thank you so much for your time and examples provided! I'll carefully take a look at them asap!! – Pedro de Sá May 17 '22 at 18:09

2 Answers2

2

This kind of problems can be handled in various ways such as force based method or geometrical method; Here, geometrical one is used.

I have considered the points to be as circles, so an arbitrary diameter can be specified for them, and also the area size for plotting with matplotlib. So we have the following codes for the beginning:

import numpy as np
from scipy.spatial import cKDTree
from shapely.geometry.polygon import Polygon
import matplotlib.pyplot as plt

np.random.seed(85)

# Points' coordinates and the specified diameter
coords = np.array([[3, 4], [7, 8], [3, 3], [1, 8], [5, 4], [3, 5], [7, 7]], dtype=np.float64)
points_dia = 1.1             # can be chosen arbitrary based on the problem

# Polygon creation
polygon_verts = np.array([[0, 0], [8.1, 0], [8.1, 8.1], [0, 8.1]])
p = Polygon(polygon_verts)

# Plotting
x, y = coords.T
colors = np.random.rand(coords.shape[0])
area = 700                   # can be chosen arbitrary based on the problem
min_ = coords.min() - 2*points_dia
max_ = coords.max() + 2*points_dia
plt.xlim(min_, max_)
plt.ylim(min_, max_)
xc, yc = p.exterior.xy
plt.plot(xc, yc)
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()

I separate the total answer to two steps:

  • dispersing (moving away) the suspected points from average position coordinates
  • curing the point-polygon overlaps

First step

For the first part, based on the number of points and neighbors, we can choose between Scipy and other libraries e.g. Scikit learn (my last explanations on this topic 1 (4th-5th paragraphs) and 2 can be helpful for such selections). Based on the (not large) sizes that the OP is mentioned in the comment, I will suggest using scipy.spatial.cKDTree, which has the best performance in this regard based on my experiences, to query the points. Using cKDTree.query_pairs we specify groups of points that are in the distance range of the diameter value from each other. Then by looping on the groups and averaging each group's points' coordinates, we can again use cKDTree.query to find the nearest point (k=1) of that group to the specified averaged coordinate. Now, it is easy to calculate distance of other points of that group to the nearest one and move them outward by a specified distance (here just as much as the overlaps):

nears = cKDTree(coords).query_pairs(points_dia, output_type='ndarray')
near_ids = np.unique(nears)
col1_ids = np.unique(nears[:, 0])

for i in range(col1_ids.size):

    pts_ids = np.unique(nears[nears[:, 0] == i].ravel())
    center_coord = np.average(coords[pts_ids], axis=0)
    nearest_center_id = pts_ids[cKDTree(coords[pts_ids]).query(center_coord, k=1)[1]]
    furthest_center_ids = pts_ids[pts_ids != nearest_center_id]

    vecs = coords[furthest_center_ids] - coords[nearest_center_id]
    dists = np.linalg.norm(vecs, axis=1) - points_dia
    coords[furthest_center_ids] = coords[furthest_center_ids] + vecs / np.linalg.norm(vecs, axis=1) * np.abs(dists)

Second step

For this part, we can loop on the modified coordinates (in the previous step) and find the two nearest coordinates (k=2) of the polygon's edges and project the point to that edge to find the closest coordinate on that line. So, again as step 1, we calculate overlaps and move the points to placing them inside the polygon. The points that where single (not in groups) will be moved shorter distances. This distances can be specified as desired and I have set them as a default value:

for i, j in enumerate(coords):
    for k in range(polygon_verts.shape[0]-1):
        nearest_poly_ids = cKDTree(polygon_verts).query(j, k=2)[1]

        vec_line = polygon_verts[nearest_poly_ids][1] - polygon_verts[nearest_poly_ids][0]
        vec_proj = j - polygon_verts[nearest_poly_ids][0]
        line_pnt = polygon_verts[nearest_poly_ids][0] + np.dot(vec_line, vec_proj) / np.dot(vec_line, vec_line) * vec_line

        vec = j - line_pnt
        dist = np.linalg.norm(vec)

        if dist < points_dia / 2:
            if i in near_ids:
                coords[i] = coords[i] + vec / dist * 2 * points_dia
            else:
                coords[i] = coords[i] + vec / dist

enter image description here


These codes, perhaps, could be optimized to get better performances, but it is not of importance for this question. I have checked it on my small example and need to be checked by larger cases. In all conditions, I think this will be one of the best ways to achieve the goal (may some changes based on the need) and can be a template which can be modified to satisfy any other needs or any shortcomings.

Ali_Sh
  • 2,667
  • 3
  • 43
  • 66
  • 1
    Hey thank you for your answer! I was just wondering, `col1_ids.size` would be the N=329 points I mentioned on my previous comment? Also, how did you get `nears` array on step 1? – Pedro de Sá May 19 '22 at 01:43
  • @PedrodeSá , Pardon me, I correct it. It was forgotten during copy and pasting here. Now, it is working. – Ali_Sh May 19 '22 at 03:35
  • @PedrodeSá , Did you check it? If any problem, please provide your data (position of the spheres and …) to adjust this code for them. It may need to be tuned based on your input data shapes. Save the data by `np.save` as `npy` format and put a reference link to them here to modify the code based on them. – Ali_Sh May 21 '22 at 03:28
1

so this is not particularly fast due to the loop checking for "point in polygon", but as long as you don't intend to ave a very fine grid-spacing something like this would do the job:

from scipy.interpolate import NearestNDInterpolator 

# define a regular grid
gx, gy = np.meshgrid(np.linspace(50, 200, 100), 
                     np.linspace(100, 200, 100))

# get a interpolation function
# (just used "xs" as values in here...)
i = NearestNDInterpolator(np.column_stack((xs, ys)), xs)

# crop your grid to the shape
newp = []
for x, y in zip(gx.flat, gy.flat):
    pt = Point(x, y)
    if p.contains(pt):
        newp.append(pt.xy)

newp = np.array(newp).squeeze()

# evaluate values on the grid
vals = i(*newp.T)

ax1.scatter(*newp.T, c=vals, zorder=0)

enter image description here

raphael
  • 2,159
  • 17
  • 20
  • hey, thank you for your answer! As far as I understood it, this would give me a grid with new equally spaced points, right? But my original points (N=100, on the example above) would stil be in its original position. Forgive me if I got something wrong. – Pedro de Sá May 17 '22 at 20:45
  • I've tried to find the nearest grid points, defined using your solution, using this [question](https://stackoverflow.com/questions/10818546/finding-index-of-nearest-point-in-numpy-arrays-of-x-and-y-coordinates), but my original points are so close to each other that they end up having the same neighbours on the grid (even creating a grid with many points in `np.linspace`). I believe the solution revolves around spacing the original points themselves unfortunately – Pedro de Sá May 17 '22 at 21:08
  • Hey, so I'm also still a bit puzzled what's your ultimate goal here... is it just about plotting or is there a particular reason why you need the points to be located on a regular grid? if the initial points are very close together you'd anyway need to use a very fine grid to ensure that each one has a unique neighbour... (thus having the same problem again...) (If it's just about plotting, why not simply zoom-in on the areas where the points are too dense? (e.g. something like [this](https://eomaps.readthedocs.io/en/latest/api.html#inset-maps-zoom-in-on-interesting-areas) ) – raphael May 18 '22 at 07:48
  • The final goal is to use a heatmap on the map to see how a specific variable changes over lat and long. The problem I'm facing right now is because as I have too many points really close to each other, the heatmap has more than one value pratically at the same (lat,long) pair. Thats why I'm looking to evenly distribute them, so all variable values would be visible – Pedro de Sá May 18 '22 at 16:38
  • ok, so without knowing what your values represent I don't really know what's best to do here... but if your data represents actual measurements at certain locations, shifting the coordinates of the points would distort the plot and it would no longer represent the spatial distribution of your data. why go for a heatmap if it is not suitable for visualizing the spatial variabilitiy of your data? – raphael May 18 '22 at 17:18
  • The values are only a float number [0.5, 1.0]. I'd like a heatmap because the specific location (like, centismals of lat long pair) isn't too important, but the area where the point is itself would be helpful to understand things like: the area is near a forest or the surroundings of the point is near a specific location, etc. The heatmap would help me to better understand the low values for my number (near 0.5) due to terrain and things like that – Pedro de Sá May 18 '22 at 22:34