I have a set of datapoints in CSR matrix format, and a list of clustered points from the dataset each in CSR matrix format. I have to go through my set of datapoints and figure out which cluster it is in. I have around 8000 datapoints total.
I tried looping through the datapoints and clusters and using the in
keyword:
for c in cluster:
if datapoint.toarray() in c.toarray():
# do stuff
But the in
test returns true no matter what. Anyone have a more efficient method then checking element by element?
Note: dataset and cluster are CSR matrices. toarray()
is a csr matrix method from the scipy library.