1

I have a list of ND arrays(vectors), each vector has a (1,300) shape.
My goal is to find duplicate vectors inside a list, to sum them and then divide them by the size of a list, the result value(a vector) will replace the duplicate vector.
For example, a is a list of ND arrays, a = [[2,3,1],[5,65,-1],[2,3,1]], then the first and the last element are duplicates. their sum would be :[4,6,2], which will be divided by the size of a list of vectors, size = 3.

Output: a = [[4/3,6/3,2/3],[5,65,-1],[4/3,6/3,2/3]]

I have tried to use a Counter but it doesn't work for ndarrays.

What is the Numpy way? Thanks.

Art
  • 91
  • 2
  • 10

2 Answers2

1

If you have numpy 1.13 or higher, this is pretty simple:

def f(a):
    u, inv, c = np.unique(a, return_counts = True, return_inverse = True, axis = 0)
    p = np.where(c > 1,  c / a.shape[0], 1)[:, None]
    return (u * p)[inv]

If you don't have 1.13, you'll need some trick to convert a into a 1-d array first. I recommend @Jaime's excellent answer using np.void here

How it works:

  • u is the unique rows of a (usually not in their original order)
  • c is the number of times each row of u are repeated in a
  • inv is the indices to get u back to a, i.e. u[inv] = a
  • p is the multiplier for each row of u based on your requirements. 1 if c == 1 and c / n (where n is the number of rows in a) if c > 1. [:, None] turns it into a column vector so that it broadcasts well with u

return u * p indexed back to their original locations by [inv]

Daniel F
  • 13,620
  • 2
  • 29
  • 55
  • It works now , after I converted! `[[ 1.33333333 2. 0.66666667] [ 5. 65. -1. ] [ 1.33333333 2. 0.66666667]]` Could you please add an explanation on how it works? – Art Feb 16 '18 at 07:52
  • I also converted `a` into `np.array` in order for it to work like this: `a = np.array([[2,3,1],[5,65,-1],[2,3,1]])` – Art Feb 16 '18 at 07:59
0

You can use numpy unique , with count return count

 elements, count = np.unique(a, axis=0, return_counts=True)

Return Count allow to return the number of occurrence of each element in the array

The output is like this ,

(array([[ 2,  3,  1],
        [ 5, 65, -1]]), array([2, 1]))

Then you can multiply them like this :

(count * elements.T).T

Output :

array([[ 4,  6,  2],
       [ 5, 65, -1]])
Espoir Murhabazi
  • 5,973
  • 5
  • 42
  • 73