In Python. I have a list of ND arrays and I want to count duplicate arrays in order to calculate an Average for each Duplicate array value

Question

I have a list of ND arrays(vectors), each vector has a (1,300) shape.
My goal is to find duplicate vectors inside a list, to sum them and then divide them by the size of a list, the result value(a vector) will replace the duplicate vector.
For example, a is a list of ND arrays, a = [[2,3,1],[5,65,-1],[2,3,1]], then the first and the last element are duplicates. their sum would be :[4,6,2], which will be divided by the size of a list of vectors, size = 3.

Output: a = [[4/3,6/3,2/3],[5,65,-1],[4/3,6/3,2/3]]

I have tried to use a Counter but it doesn't work for ndarrays.

What is the Numpy way? Thanks.

Daniel F · Answer 1 · 2018-02-16T08:06:37.383

1

If you have numpy 1.13 or higher, this is pretty simple:

def f(a):
    u, inv, c = np.unique(a, return_counts = True, return_inverse = True, axis = 0)
    p = np.where(c > 1,  c / a.shape[0], 1)[:, None]
    return (u * p)[inv]

If you don't have 1.13, you'll need some trick to convert a into a 1-d array first. I recommend @Jaime's excellent answer using np.void here

How it works:

u is the unique rows of a (usually not in their original order)
c is the number of times each row of u are repeated in a
inv is the indices to get u back to a, i.e. u[inv] = a
p is the multiplier for each row of u based on your requirements. 1 if c == 1 and c / n (where n is the number of rows in a) if c > 1. [:, None] turns it into a column vector so that it broadcasts well with u

return u * p indexed back to their original locations by [inv]

edited Feb 16 '18 at 08:06

answered Feb 16 '18 at 07:35

Daniel F

13,620
2
29
55

It works now , after I converted! `[[ 1.33333333 2. 0.66666667] [ 5. 65. -1. ] [ 1.33333333 2. 0.66666667]]` Could you please add an explanation on how it works? – Art Feb 16 '18 at 07:52
I also converted `a` into `np.array` in order for it to work like this: `a = np.array([[2,3,1],[5,65,-1],[2,3,1]])` – Art Feb 16 '18 at 07:59

Espoir Murhabazi · Answer 2 · 2018-02-16T07:51:04.703

0

You can use numpy unique , with count return count

 elements, count = np.unique(a, axis=0, return_counts=True)

Return Count allow to return the number of occurrence of each element in the array

The output is like this ,

(array([[ 2,  3,  1],
        [ 5, 65, -1]]), array([2, 1]))

Then you can multiply them like this :

(count * elements.T).T

Output :

array([[ 4,  6,  2],
       [ 5, 65, -1]])

edited Feb 16 '18 at 07:51

answered Feb 16 '18 at 07:33

Espoir Murhabazi

5,973
5
42
73

In Python. I have a list of ND arrays and I want to count duplicate arrays in order to calculate an Average for each Duplicate array value

2 Answers2