3

I want to split a numpy array based on the values of two columns. I want to split at the index after both of the first two columns reach their maximum (simultaneously). Each column reaches its maximum several times. The maximum of each column can be seen individually (when the other one is not in its maximum), But I need to separate when they are both at their maximum value. Lets say I have

arr =  [[ 1., 5, 12],
        [ 1., 9,  5],
        [15., 5,  5],
        [25., 7,  4],
        [25., 9,  4],
        [1.5, 4, 10],
        [ 1., 8,  7],
        [20., 5,  6],
        [25., 8,  3],
        [25., 9,  3]]

I want to get:

arr_1 = [[ 1., 5, 12],
         [ 1., 9,  5],
         [15., 5,  5],
         [25., 7,  4],
         [25., 9,  4]]

arr_2 = [[1.5, 4, 10],
         [ 1., 8,  7],
         [20., 5,  6],
         [25., 8,  3],
         [25., 9,  3]]
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Ali_d
  • 1,089
  • 9
  • 24

2 Answers2

3

Assuming you want the output to be a list of lists, you can iterate over the elements of the original array and look for a "separating" element.

One possible implementation:

def split_at_max(arr):
    m0 = max(a[0] for a in arr)
    m1 = max(a[1] for a in arr)
    res = [[]]
    for i,a in enumerate(arr):
        res[-1].append(a)
        if (a[:2] == [m0, m1]) and (i != len(arr) - 1):
            res.append([])
   return res
N.C
  • 325
  • 5
  • 14
  • Dear @N.C, I appreciate your clever solution. Sorry that I made a mistake and showed my data as a list. In fact, they are numpy arrays. Do you any way to mimic your solution for an array rather than a list? Thanks in advance, – Ali_d Sep 25 '20 at 12:25
  • 2
    In order to fix the function to handle numpy arrays you can use the condition: `(a[0] == m0) and (a[1] == m1)` in the if statement. What is the desired type of the output? – N.C Sep 25 '20 at 12:33
  • I prefer to have my results as numpy arrays. Thanks for replying and giving time. – Ali_d Sep 25 '20 at 12:37
  • @Ali_d. I've added a pure numpy solution – Mad Physicist Sep 25 '20 at 12:45
1

You can create a boolean mask of all locations where an array is equal to its maximum:

max_val = arr[:, :2].max(axis=0)
mask = arr[:, :2] == max_val

Then make a row mask of all places where all the columns match:

row_mask = mask.all(axis=1)

You want the locations of the index after the match, so you can do one of the following:

shifted_row_mask = np.r_[False, row_mask [:-1]]
index = np.flatnonzero(shifted_row_mask)

Or

index = np.flatnonzero(row_mask[:-1]) + 1

In both cases, you want to discard the last element to prevent overflow, and add one.

Now you can just call np.split:

result = np.split(arr, index, axis=0)

This can all be written as a nice, totally illegible, one-liner:

result = np.split(arr, np.flatnonzero((arr[:, :2] == arr[:, :2].max(axis=0)).all(axis=1)[:-1]) + 1, axis=0)

If you want the output in the exact format you showed, restrict the number of indices to 1, and unpack the result of np.split:

arr_1, arr_2 = np.split(arr, index[0], axis=0)
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264