-1

I have a 2D np array containing str values. I would like to get a count of the unique arrays (rows) that make up this 2D array.

This would be the example input:

array([[A, B, C],
       [A, B, C],
       [C, E, F],
       [F, J, K]])

I would like to be able to get the unique counts of each of those rows:

[A, B, C] -> 2
[C, E, F] -> 1
[F, J, K] -> 1

Thank you.

msalkind
  • 13
  • 1
  • 1
    Does this answer your question? [numpy.unique with order preserved](https://stackoverflow.com/questions/15637336/numpy-unique-with-order-preserved) – RichieV Sep 01 '20 at 04:10

1 Answers1

1

With lists of lists, you can get the count of rows, by using the axis=0 option (to specify rows) with the numpy.unique() function and the return_counts=True option:

>>> a = np.array([(1,2,3),(1,2,3),(3,4,5),(5,6,7)])
>>> np.unique(a, return_counts=True, axis=0)
(array([[1, 2, 3],
       [3, 4, 5],
       [5, 6, 7]]), array([2, 1, 1]))

The first return values is the unique rows, and the second return value is the counts for those rows. Without the return_counts=True option, you would only get the first return value. Without the axis=0 option, the whole array would be flattened for the purpose of counting unique elements. axis=0 specifies that rows should be flattened (if they were more than 1D already) and then treated as unique values.

If you can use tuples instead of lists for the rows, then you can use numpy.unique() with the axis option.

This post explains how to use a list of tuples for a numpy array.

Together, it should look something like this:

>>> l = [(1,2,3),(1,2,3),(3,4,5),(5,6,7)]
>>> a = np.empty(len(l), dtype=object)
>>> a
array([None, None, None, None], dtype=object)
>>> a[:] = l
>>> a
array([(1, 2, 3), (1, 2, 3), (3, 4, 5), (5, 6, 7)], dtype=object)
>>> np.unique(a, return_counts=True)
(array([(1, 2, 3), (3, 4, 5), (5, 6, 7)], dtype=object), array([2, 1, 1]))
Jack Thias
  • 70
  • 1
  • 9