0

I have a set of datapoints in CSR matrix format, and a list of clustered points from the dataset each in CSR matrix format. I have to go through my set of datapoints and figure out which cluster it is in. I have around 8000 datapoints total.

I tried looping through the datapoints and clusters and using the in keyword:

       for c in cluster:
           if datapoint.toarray() in c.toarray():
               # do stuff

But the in test returns true no matter what. Anyone have a more efficient method then checking element by element?

Note: dataset and cluster are CSR matrices. toarray() is a csr matrix method from the scipy library.

omri
  • 352
  • 2
  • 18
  • It might be useful to include an example of what `cluster` and `datapoint` look like. Also, how does the function toarray() work? If you give more details it will be easier to obtain a relevant answer – gionni Dec 06 '20 at 17:31
  • cluster and datapoint are csr_matrix objects. toarray() is a method belonging to the scipy.sparse_matrix class. I am not so familiar with working with sparse matrices, unfortunately, hence my difficulty – omri Dec 06 '20 at 17:48
  • Great, you should edit the question and add this information, and also, include some example data in the question, so you that the burden is not upon those who answer. As a pointer, you can check this [link]( https://stackoverflow.com/questions/14766194/testing-whether-a-numpy-array-contains-a-given-row) , which gives some ideas on how to check whether an array contains a given row. – gionni Dec 06 '20 at 17:53
  • That link is actually where I got the in keyword from, but for some reason it is always passing when I use it on my sparse matrices. I will update the post with the information I mentioned shortly. – omri Dec 06 '20 at 17:56
  • With the `toarray()` use, you are no longer testing sparse matrices, but just numpy arrays. Construct a couple of sample arrays/lists, and get that step working. You might as well do `datapoint.toarray()` once before the loop. – hpaulj Dec 06 '20 at 18:01

0 Answers0