I have around 500,000 32 dimensional vectors (normalized with mean=0, std=1) that I currently store in a KD tree to efficiently find the nearest neighbor. However, now I also want to be able to exclude some of the vectors from the database dynamically for some queries (the condition changes often, so re-building the tree is not an option). I want to fix some of the 32 dimensions to a certain range depending on some conditions that change during runtime.
What I currently do, is instead of looking for k=1 nearest neighbors, I look for k=50 (or more) nearest neighbors and then iterate from the closest to the farthest until I find one that matches the condition. Unfortunately that is not a very elegant solution as it requires the query to return k=50 matches even if k=1 would already have returned the one I am looking for. Also, if k=50 was too small, I need to do another query with k=500 or so and that hurts performance.
So two solutions come to my mind:
Find a KD tree implementation that returns an iterator instead of a fixed-size result with k entries. The iterator would start at the nearest neighbor and then move towards neighbors with greater distance. Due to the design of a KDTree this should be very efficient. Then the tree only needs to be searched until a valid result is found and no fixed k needs to be specified. I was not able to find a Python implementation for that so far.
Use a different data structure or database (MySQL for example) that is designed to do queries based on conditions. Is there any database system (I am also open for NoSQL) that supports efficient nearest neighbor search using dynamic conditions? Maybe a database that allows to use a KD tree as the index?
If nothing is yet available, I'll probably give myself a try to implement a KD tree that does what I want on my own.
EDIT: The language I am currently using is Python for proto-typing, later I will move to C# (Unity).