-1

I am working with a huge 2d dataset and need a range query for every point, returning the neighbours within a range as a set I already have tested using an index with KD Tree form sk learn, but the problem is, it returns the index as a list and the converting to a set takes too long. Is there a data structure, which returns the points from a range query as a set and not as a list?

Felix Ha
  • 401
  • 5
  • 11
  • 1
    Are you sure that constructing a Set from a List takes too long? You make it sound like it takes as long as or even longer than the range query itself, in that case I suspect there is something wrong with your creation of a set. Maybe you could show us some timing numbers? Also, every spatial tree would also have to internally add the result points to a Set, so that cannot really be faster than when you iterate over a list to create a Set... – TilmannZ May 28 '17 at 11:11
  • to your question here is the answer https://stackoverflow.com/questions/44224696/convert-np-array-to-a-set-takes-too-long :) – Felix Ha May 30 '17 at 17:07
  • Well the reference tells only why conversion to a Set takes longer than conversion to a List. My question was whether it is really the bottleneck, usually range query should take longer than the creation of a set or list. Also, as I mentioned, if you want a Set, someone has to bear the cost of creating it, whether it is you or whether it is the KD-Tree doing it internally... – TilmannZ May 31 '17 at 18:30

1 Answers1

1

The result is not natively a list.

Get the source code of the k-d-tree, and modify so it directly writes to a set, instead of a list.

But I highly doubt this will solve your actual problem. Converting a small list to a set should barely be a performance issue... but well, you are using python. A traditional python set() will be a lot lot lot slower than an numpy array. But don't blame the data structure for not using a slow set.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194