I have a list of unique rows and another larger array of data (called test_rows in example). I was wondering if there was a faster way to get the location of each unique row in the data. The fastest way that I could come up with is...
import numpy
uniq_rows = numpy.array([[0, 1, 0],
[1, 1, 0],
[1, 1, 1],
[0, 1, 1]])
test_rows = numpy.array([[0, 1, 1],
[0, 1, 0],
[0, 0, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 1],
[0, 1, 1],
[1, 1, 1],
[1, 1, 0],
[1, 1, 1],
[0, 1, 0],
[0, 0, 0],
[1, 1, 0]])
# this gives me the indexes of each group of unique rows
for row in uniq_rows.tolist():
print row, numpy.where((test_rows == row).all(axis=1))[0]
This prints...
[0, 1, 0] [ 1 4 10]
[1, 1, 0] [ 3 8 12]
[1, 1, 1] [7 9]
[0, 1, 1] [0 5 6]
Is there a better or more numpythonic (not sure if that word exists) way to do this? I was searching for a numpy group function but could not find it. Basically for any incoming dataset I need the fastest way to get the locations of each unique row in that data set. The incoming dataset will not always have every unique row or the same number.
EDIT: This is just a simple example. In my application the numbers would not be just zeros and ones, they could be anywhere from 0 to 32000. The size of uniq rows could be between 4 to 128 rows and the size of test_rows could be in the hundreds of thousands.