9

Motivation:

I've seen this algorithm described, and I'd rather not reinvent the wheel if a standard implementation exists. I've also learned that if there is a scipy/numpy implementation, it is usually much faster than anything I can roll myself in python.

Algorithm Description

I have a large number of points on the plane (several million). Starting with a large box that encompasses all the points, I'd like to continuously subdivide the box into equal area sub-boxes. The subdivision continues recursively while there are at least 1,000 points in the sub-box. The algorithm returns a tree that describes the subdivisions and the mapping of the points to each leaf node of the tree.

What is the name of this algorithm (something like divide and conquer?), and is there a standard method of doing it when given a 2D numpy array of points?

Hooked
  • 84,485
  • 43
  • 192
  • 261
  • If the splitting in in one dimension it is a N-kD tree (for N=2). Also the splitting should be done such that the sizes *of the populations* of the two parts are (about) equal. – wildplasser Mar 21 '12 at 18:46
  • @wildplasser from a computational standpoint, I'd agree with you about the split along equal sized populations (for minimum tree depth). However, I'm trying to replicate the results of a paper where they do exactly as I say, a split of equal _areas_. – Hooked Mar 21 '12 at 18:48
  • Oops, that's ugly. But at least they stop splitting once size < 1000. BTW: even better is to split on a N-1 D hyperplane (perpendicular to the main axis of the hyper-ellipsoid formed by the points) – wildplasser Mar 21 '12 at 18:52

1 Answers1

12

It's called a quadtree partition. As far as Python code, see this thread.

As per the comment by Joe Kington, take a look at scipy.spatial.KDTree and/or scipy.spatial.cKDTree.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • 6
    Just for whatever it's worth, `scipy.spatial.KDTree` (and/or `scipy.spatial.cKDTree`, which is written in C for performance reasons) is a far more robust choice than the options listed in the the thread you linked to, i.m.o. – Joe Kington Mar 21 '12 at 20:03
  • @JoeKington - Good to know. Something from scipy is probably better for OP anyway, given the tags. – Ted Hopp Mar 21 '12 at 20:09