Fastest way to remove identical sub-arrays in a nd-array?

Question

Let's consider a 2d-array A

2   3   5   7
2   3   5   7
1   7   1   4
5   8   6   0
2   3   5   7

The first, second and last lines are identical. The algorithm I'm looking for should return an 2d-array with only one of the identical lines, and the number of identical lines for each line in the resulting 2d-array. I use an inefficient naive algorithm to do that:

import numpy
A=numpy.array([[2,  3,  5,  7],[2,  3,  5,  7],[1,  7,  1,  4],[5,  8,  6,  0],[2,  3,  5,  7]])
i=0
end = len(A)
while i<end:
    print i,
    j=i+1
    numberID = 1
    while j<end:
        print j
        if numpy.array_equal(A[i,:] ,A[j,:]):
            A=numpy.delete(A,j,axis=0)
            end-=1
            numberID+=1
        else:
            j+=1
    i+=1
print A, len(A)

Expected result:

array([[2, 3, 5, 7],
       [1, 7, 1, 4],
       [5, 8, 6, 0]]) # 2d-array freed from identical lines
array([3,1,1]) # number identical arrays per line

This algo looks like using python native within numpy so inefficient. Thanks for help.

To get the counts: http://stackoverflow.com/q/10741346/2379410 — , Oct 14 '14 at 16:31
not the good answer to count the number of identical sub-arrays. Still looking for leads ;-) — sol, Oct 15 '14 at 14:02
I'd say you now have more than leads: the answers to the question I linked and the one marked as duplicate provide the ingredients to do this in the fastest way, you just need to combine them. If you like something simpler that's still reasonably fast, look at `Counter` from the [`collections`](https://docs.python.org/2/library/collections.html#collections.Counter) module. E.g.: `Counter(map(tuple, A))` — , Oct 16 '14 at 03:19

Fastest way to remove identical sub-arrays in a nd-array?

0 Answers0