1

I'm trying to find the first occurrence of any row in an array in which either column has a number that has changed since the last time it appeared. Given the array below:

import numpy as np
arr = np.array([[1, 11], [2, 21], [3, 31], [4, 41], [1, 11], [2, 21], [3, 31], [4, 42]])

The output I'm looking for would look like:

subArr = [[1, 11]
          [2, 21] 
          [3, 31] 
          [4, 41] 
          [4, 42]]

In the actual problem, the numbers are not as sequential as they appear here and cannot be predicted in advance. I've tried finding the first instance in an array, using multiple conditions, trying to get the first element in a 2-D array, and accessing the ith column. Although some of these were helpful but I can't get it do all the things I want. I tried:

subArr = arr[np.unique(np.logical_and(arr[:,0][0], arr[:,1][0]))]

which didn't work. I also tried:

subArr = arr[(arr[:,0][0]) & (arr[:,1][0])]

I'm sure it's just a matter of getting the syntax right but I can't figure out what I'm missing. Any help would be greatly appreciated.

Using:

Python 3.6

Numpy 1.18.1

Bar-Tzur
  • 85
  • 1
  • 10

1 Answers1

3

Use the axis parameter of numpy.unique:

In [16]: arr                                                                                          
Out[16]: 
array([[ 1, 11],
       [ 2, 21],
       [ 3, 31],
       [ 4, 41],
       [ 1, 11],
       [ 2, 21],
       [ 3, 31],
       [ 4, 42]])

In [17]: np.unique(arr, axis=0)                              
Out[17]: 
array([[ 1, 11],
       [ 2, 21],
       [ 3, 31],
       [ 4, 41],
       [ 4, 42]])

The returned values are copies of the rows from the original array, so it doesn't really make sense to ask if a row in the output corresponds to the first occurrence of the same values in the input.

If you need to know the indices of the first occurrence of each unique row in the input, you can add the argument return_index. When you do this, unique ensures that the index will be that of the first occurrence of the corresponding unique value:

In [51]: values, indices = np.unique(arr, return_index=True, axis=0)

In [52]: values
Out[52]: 
array([[ 1, 11],
       [ 2, 21],
       [ 3, 31],
       [ 4, 41],
       [ 4, 42]])

In [53]: indices
Out[53]: array([0, 1, 2, 3, 7]
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • This works great. Can I rest assured that this is referencing the first occurrence specifically (e.g. `[1,11]` is from the first row, not from the 5th row)? – Bar-Tzur Jun 09 '20 at 16:35
  • The returned values are copies of the elements, so does it make a difference? – Warren Weckesser Jun 09 '20 at 16:52
  • Down the line it will. I will need to compare the row index to another array of `datetime` datatypes that were on the same row with the line I retrieved. The timestamp needs to be from the first occurrence and I had planned to use this array to find the index of the timestamp in the original array, which initially had the timestamps in it. If the index of a row is from the wrong row it would retrieve the wrong timestamp. – Bar-Tzur Jun 09 '20 at 17:02
  • 1
    Then use the `return_index` as I show in my updated answer. If you don't use `return_index=True`, then you have to compare each unique value to the input array to recover the original index. – Warren Weckesser Jun 09 '20 at 17:44
  • Thanks @Warren, this does give me what I was looking for. As an aside, it appears that the indices are returned in the sequential order of the value in the first column followed by the value in the second column, not necessarily in the order in which the values appear in the array. That threw me for a second. Regardless, You did provide me with a solution to my question and I sincerely appreciate your help. Thanks again. – Bar-Tzur Jun 09 '20 at 18:37