There are very useful notes on this topic in my previous issue; I think, You can find your answer there. But in summary, I think cKDTree
from Scipy and BallTree
from scikit-learn be the best choices, but you can do this by pure NumPy (as it is mentioned in that SO post), too. Seek about R-tree
's performance on your data, too (I didn't get better performances by that, but this is recommended somewhere).
Based on my experiences, cKDTree
is much faster than KDTree
in SciPy. As I pointed in my answer, cKDTree
will result in memory leaks and more time consumptions, particularly when searching domain grows up (as here, when you increased d
). In this regard, I suggest to use BallTree
when d
is increased.
Furthermore, my another SO issue and its answer may be helpful in this topic, too.
For further evaluations you must prepare an example, ways that you have tried and their resulted runtimes, beside the expected runtime range on that example. It is better to prepare an example with similar
data volume to your main data volume due to the reason that I mentioned above.