I am working with two large datasets (coordinates of three dimensional points). The first dataset has about a million points (Data1), while the second one has fifty million points (Data2). I need to do pair count and range queries by comparing the two datasets with each other. Presently, I am using the scipy cKDTree (I've read it's fast).
from scipy.spatial import cKDTree
KDTree_Data1 = cKDTree(Data1)
KDTree_Data2 = cKDTree(Data2)
print "KDTree created"
Data2_Indx = KDTree_Data1.query_ball_tree(KDTree_Data2, r=170, p=2.0, eps=0)
print "Something"
When I run this script for test cases (small sizes of Data1 and Data2), it runs fine, but when I run it for the actual case (~1 mil for Data 1 and ~50 mil for Data2), I get the following output (and error message).
KDTree created
Killed: 9
I suspect the query_ball_tree requires more memory than what my computer has. I am working with a Mac with 16 GB (1867 MHz DDR3) Memory and 3.1 GHz (Intel Core i7) Processor.
Can anyone suggest an alternative to cKDTree (or query_ball_tree) that will require less memory? I will be thankful to have any helpful response.