I need to find duplicate numbers in multiple one-dimensional arrays and the number of repetitions for each repetition, This is good for one-dimensional arrays np.unique, but does not seem to apply to two-dimensional arrays, I have searched for similar answers, but I need a more detailed report.(The number of occurrences of all numbers, the position index)
Can numpy bincount work with 2D arrays? This answer does not match, I hope to get a map containing more information on some of the data, such as a number of the most, and I do not like recycling, maybe this is not appropriate, but I will try to find ways to not use a loop,Because I have a very harsh demand for speed.
For example:
a = np.array([[1,2,2,2,3],
[0,1,1,1,2],
[0,0,0,1,0]])
# The number of occurrences for each number
# int count
# 0. 0
# 1. 1
# 2. 3
# 3. 1
#need the output:
#Index = the number of statistics, the number of repetitions
[[0 1 3 1]
[1 3 1 0]
[4 1 0 0]]
Because this is part of the loop, you need an efficient way of vectoring to complete more rows of statistics at once, and try to avoid looping again.
I've used packet aggregation to count the results. The function does this by constructing a key1 that differentiates rows, the data itself as key2, and a two-dimensional array of all 1s, Although able to output, but I think it is only temporary measures.Need the right way.
from numpy_indexed import group_by
def unique2d(x):
x = x.astype(int); mx = np.nanmax(x)+1
ltbe = np.tile(np.arange(x.shape[0])[:,None],(1,x.shape[1]))
vtbe = np.zeros(x.shape).astype(int) + 1
groups = npi.group_by((ltbe.ravel(),x.ravel().astype(int)))
unique, median = groups.sum(vtbe.ravel())
ctbe = np.zeros(x.shape[0]*mx.astype(int)).astype(int)
ctbe[(unique[0] * mx + unique[1]).astype(int)] = median
ctbe.shape=(x.shape[0],mx)
return ctbe
unique2d(a)
>array([[0, 1, 3, 1],
[1, 3, 1, 0],
[4, 1, 0, 0]])
Hope there are good suggestions and algorithms, thanks