5

I have n points in a 3D space. I want to stochastically sample a subset of points with all nearest-neighbor distances larger than r. The size of the subset m is unknown, but I want the sampled points to be as dense as possible, i.e. maximize m.

There are similar questions, but they are all about generating points, rather than sampling from given points.
Generate random points in 3D space with minimum nearest-neighbor distance

Generate 3-d random points with minimum distance between each of them?

Say I have 300 random 3D points,

import numpy as np
n = 300
points = np.random.uniform(0, 10, size=(n, 3))

What is the fastest way to get a subset of m points with minimum nearest-neighbor distance r = 1 while maximizing m?

Shaun Han
  • 2,676
  • 2
  • 9
  • 29
  • Are you interested in approximations or must the result be optimal? – orlp Jan 10 '21 at 21:17
  • 1
    Also this problem can aptly be described as "finding the maximum independent set in the unit disk (ball?) graph of a set of Euclidean 3D points". – orlp Jan 10 '21 at 21:20
  • In Jallu, R. K., & Das, G. K. "Improved Algorithm for Maximum Independent Set on Unit Disk Graph" the authors claim the problem (in 2D, which implies 3D) is NP-hard, citing the book Garey, M., Johnson, D., "Computers and intractability: a guide to the theory of NP-completeness" as the source. – orlp Jan 10 '21 at 21:25
  • An approximation that is fast enough is good for me, no need to be optimal. I understand it might take forever to find the global optimum given 300 points. – Shaun Han Jan 10 '21 at 21:46
  • No answers in the question, please. I have rolled back/edited your question and removed the answer. Add the answer in the answer section only. – Sabito stands with Ukraine Jan 18 '21 at 09:32
  • @Yatin the demonstration was to explain the bounty . . . – Daniel F Jan 18 '21 at 09:36
  • @DanielF Oh, I will have to ask around a bit on what to do in this situation. I still think that an answer should be in the answer section... But if the consensus is to leave it in then I will oblige and rollback to its previous state. – Sabito stands with Ukraine Jan 18 '21 at 09:38
  • @Yatin, please do rollback since the update added a lot of information that was useful. Until the author decides to update their answer (and edit OPs question accordingly, since clearly, he would have the priveledges to do so), it doesn't make sense to remove such vital info that helps in understanding the problem. – Akshay Sehgal Jan 18 '21 at 22:38
  • 1
    @Yatin It's okay. I just found that David's answer is not working when minimum distance is not 1, so no need to put the first update back. I have posted a new update in the answer section. – Shaun Han Jan 19 '21 at 00:19

4 Answers4

2

There's probably an efficient bicriteria approximation scheme, but why bother when integer programming is so quick on average?

import numpy as np

n = 300
points = np.random.uniform(0, 10, size=(n, 3))

from ortools.linear_solver import pywraplp

solver = pywraplp.Solver.CreateSolver("SCIP")
variables = [solver.BoolVar("x[{}]".format(i)) for i in range(n)]
solver.Maximize(sum(variables))
for j, q in enumerate(points):
    for i, p in enumerate(points[:j]):
        if np.linalg.norm(p - q) <= 1:
            solver.Add(variables[i] + variables[j] <= 1)
solver.EnableOutput()
solver.Solve()
print(len([i for (i, variable) in enumerate(variables) if variable.SolutionValue()]))
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • OP added some details into their question (rev [5](https://stackoverflow.com/revisions/65654970/5)) trying to justify why they accepted your answer. That is meta-commentary on the post itself and shouldn't be part of the question (hence I removed it). Because the output came from running your code, can you add those details into your answer. That way all parties concerned will be satisfied. – Sabito stands with Ukraine Jan 18 '21 at 10:03
  • Can you check my new testing result? Your code seems to only work when minimum distance is 1 for some reason. – Shaun Han Jan 19 '21 at 00:09
1

This isn't optimally large of a subset, but should be close while not taking very long, using KDTree to optimize the distance calculations:

from scipy.spatial import KDTree
import numpy as np

def space_sample(n = 300, low = 0, high = 10, dist = 1):
    points = np.random.uniform(low, high, size=(n, 3))
    k = KDTree(points)
    pairs = np.array(list(k.query_pairs(dist)))
    
    def reduce_pairs(pairs, remove = []):  #iteratively remove the most connected node
        p = pairs[~np.isin(pairs, remove).any(1)]
        if p.size >0:
            count = np.bincount(p.flatten(), minlength = n)
            r = remove + [count.argmax()]
            return reduce_pairs(p, r)
        else:
            return remove
    
    return np.array([p for i, p in enumerate(points) if not(i in reduce_pairs(pairs))])

subset = space_sample()

Iteratively removing the most connected node is not optimal (keeps about 200 of the 300 points), but is likely the fastest algorithm that's close to optimal (the only thing faster being just removing at random). You could possibly @njit reduce_pairs to make that part faster (will try if I have time later).

Daniel F
  • 13,620
  • 2
  • 29
  • 55
  • I tried your code. The result is decent, but it takes me an awful long time to sample a subset from 1000 points. Moreover, I need a code that explicitly calculates distance since in my real case I need to take care of periodic boundary condition. Therefore I cannot use ```KDTree``` – Shaun Han Jan 19 '21 at 00:13
0

Testing @David Eisenstat's answer with 30 given points:

from ortools.linear_solver import pywraplp
import numpy as np

def subset_David_Eisenstat(points, r):
    solver = pywraplp.Solver.CreateSolver("SCIP")
    variables = [solver.BoolVar("x[{}]".format(i)) for i in range(len(points))]
    solver.Maximize(sum(variables))
    for j, q in enumerate(points):
        for i, p in enumerate(points[:j]):
            if np.linalg.norm(p - q) <= r:
                solver.Add(variables[i] + variables[j] <= 1)
    solver.EnableOutput()
    solver.Solve()
    indices = [i for (i, variable) in enumerate(variables) if variable.SolutionValue()]
    return points[indices]

points = np.array(
[[ 7.32837882, 12.12765786, 15.01412241],
 [ 8.25164031, 11.14830379, 15.01412241],
 [ 8.21790113, 13.05647987, 13.05647987],
 [ 7.30031002, 13.08276009, 14.05452502],
 [ 9.18056467, 12.0800735 , 13.05183844],
 [ 9.17929647, 11.11270337, 14.03027534],
 [ 7.64737905, 11.48906945, 15.34274827],
 [ 7.01315886, 12.77870699, 14.70301668],
 [ 8.88132414, 10.81243313, 14.68685022],
 [ 7.60617372, 13.39792166, 13.39792166],
 [ 8.85967682, 12.72946394, 12.72946394],
 [ 9.50060705, 11.43361294, 13.37866092],
 [ 8.21790113, 12.08471494, 14.02824481],
 [ 7.32837882, 12.12765786, 16.98587759],
 [ 8.25164031, 11.14830379, 16.98587759],
 [ 7.30031002, 13.08276009, 17.94547498],
 [ 8.21790113, 13.05647987, 18.94352013],
 [ 9.17929647, 11.11270337, 17.96972466],
 [ 9.18056467, 12.0800735 , 18.94816156],
 [ 7.64737905, 11.48906945, 16.65725173],
 [ 7.01315886, 12.77870699, 17.29698332],
 [ 8.88132414, 10.81243313, 17.31314978],
 [ 7.60617372, 13.39792166, 18.60207834],
 [ 8.85967682, 12.72946394, 19.27053606],
 [ 9.50060705, 11.43361294, 18.62133908],
 [ 8.21790113, 12.08471494, 17.97175519],
 [ 7.32837882, 15.01412241, 12.12765786],
 [ 8.25164031, 15.01412241, 11.14830379],
 [ 7.30031002, 14.05452502, 13.08276009],
 [ 9.18056467, 13.05183844, 12.0800735 ],])

When the expected minimum distance is 1:

subset1 = subset_David_Eisenstat(points, r=1.)
print(len(subset1))
# Output: 18

Check the minimum distance:

from scipy.spatial.distance import cdist
dist = cdist(subset1, subset1, metric='euclidean')
# Delete diagonal
res = dist[~np.eye(dist.shape[0],dtype=bool)].reshape(dist.shape[0],-1)
print(np.min(res))
# Output: 1.3285513450926985

Change the expected minimum distance to 2:

subset2 = subset_David_Eisenstat(points, r=2.)
print(len(subset2))
# Output: 10

Check the minimum distance:

from scipy.spatial.distance import cdist
dist = cdist(subset2, subset2, metric='euclidean')
# Delete diagonal
res = dist[~np.eye(dist.shape[0],dtype=bool)].reshape(dist.shape[0],-1)
print(np.min(res))
# Output: 2.0612041004376223
Shaun Han
  • 2,676
  • 2
  • 9
  • 29
  • The problem is the line `solver.Add(variables[i] + variables[j] <= r)`. That `r` should remain `1` -- it's requiring that at most one of the two points appear in the chosen subset. – David Eisenstat Jan 19 '21 at 01:52
-1

This might not be too fast, but iterate the 3D distance formula, append to dictionary, sort it, then get id.

The 3D distance formula is given points (x, y, z) and (x1, y1, z1):

m = ((x-x1)^2 + (y-y1)^2)^0.5
d = (m^2 + (z-z1)^2)^0.5

(d is the total distance and ^0.5 is sqrt.)

Synthase
  • 5,849
  • 2
  • 12
  • 34
JB06
  • 1