This kind of problems can be handled in various ways such as force based method or geometrical method; Here, geometrical one is used.
I have considered the points to be as circles, so an arbitrary diameter can be specified for them, and also the area
size for plotting with matplotlib. So we have the following codes for the beginning:
import numpy as np
from scipy.spatial import cKDTree
from shapely.geometry.polygon import Polygon
import matplotlib.pyplot as plt
np.random.seed(85)
# Points' coordinates and the specified diameter
coords = np.array([[3, 4], [7, 8], [3, 3], [1, 8], [5, 4], [3, 5], [7, 7]], dtype=np.float64)
points_dia = 1.1 # can be chosen arbitrary based on the problem
# Polygon creation
polygon_verts = np.array([[0, 0], [8.1, 0], [8.1, 8.1], [0, 8.1]])
p = Polygon(polygon_verts)
# Plotting
x, y = coords.T
colors = np.random.rand(coords.shape[0])
area = 700 # can be chosen arbitrary based on the problem
min_ = coords.min() - 2*points_dia
max_ = coords.max() + 2*points_dia
plt.xlim(min_, max_)
plt.ylim(min_, max_)
xc, yc = p.exterior.xy
plt.plot(xc, yc)
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()
I separate the total answer to two steps:
- dispersing (moving away) the suspected points from average position coordinates
- curing the point-polygon overlaps
First step
For the first part, based on the number of points and neighbors, we can choose between Scipy and other libraries e.g. Scikit learn (my last explanations on this topic 1 (4th-5th paragraphs) and 2 can be helpful for such selections). Based on the (not large) sizes that the OP is mentioned in the comment, I will suggest using scipy.spatial.cKDTree
, which has the best performance in this regard based on my experiences, to query the points. Using cKDTree.query_pairs
we specify groups of points that are in the distance range of the diameter value from each other. Then by looping on the groups and averaging each group's points' coordinates, we can again use cKDTree.query
to find the nearest point (k=1
) of that group to the specified averaged coordinate. Now, it is easy to calculate distance of other points of that group to the nearest one and move them outward by a specified distance (here just as much as the overlaps):
nears = cKDTree(coords).query_pairs(points_dia, output_type='ndarray')
near_ids = np.unique(nears)
col1_ids = np.unique(nears[:, 0])
for i in range(col1_ids.size):
pts_ids = np.unique(nears[nears[:, 0] == i].ravel())
center_coord = np.average(coords[pts_ids], axis=0)
nearest_center_id = pts_ids[cKDTree(coords[pts_ids]).query(center_coord, k=1)[1]]
furthest_center_ids = pts_ids[pts_ids != nearest_center_id]
vecs = coords[furthest_center_ids] - coords[nearest_center_id]
dists = np.linalg.norm(vecs, axis=1) - points_dia
coords[furthest_center_ids] = coords[furthest_center_ids] + vecs / np.linalg.norm(vecs, axis=1) * np.abs(dists)
Second step
For this part, we can loop on the modified coordinates (in the previous step) and find the two nearest coordinates (k=2
) of the polygon's edges and project the point to that edge to find the closest coordinate on that line. So, again as step 1, we calculate overlaps and move the points to placing them inside the polygon. The points that where single (not in groups) will be moved shorter distances. This distances can be specified as desired and I have set them as a default value:
for i, j in enumerate(coords):
for k in range(polygon_verts.shape[0]-1):
nearest_poly_ids = cKDTree(polygon_verts).query(j, k=2)[1]
vec_line = polygon_verts[nearest_poly_ids][1] - polygon_verts[nearest_poly_ids][0]
vec_proj = j - polygon_verts[nearest_poly_ids][0]
line_pnt = polygon_verts[nearest_poly_ids][0] + np.dot(vec_line, vec_proj) / np.dot(vec_line, vec_line) * vec_line
vec = j - line_pnt
dist = np.linalg.norm(vec)
if dist < points_dia / 2:
if i in near_ids:
coords[i] = coords[i] + vec / dist * 2 * points_dia
else:
coords[i] = coords[i] + vec / dist

These codes, perhaps, could be optimized to get better performances, but it is not of importance for this question. I have checked it on my small example and need to be checked by larger cases. In all conditions, I think this will be one of the best ways to achieve the goal (may some changes based on the need) and can be a template which can be modified to satisfy any other needs or any shortcomings.