4

I have two numpy arrays of integers A and B. The values in array A and B correspond to time-points at which events A and B occurred. I would like to transform A to contain the time since the most recent event b occurred.

I know I need to subtract each element of A by its nearest smaller the element of B but am unsure of how to do so. Any help would be greatly appreciated.

>>> import numpy as np

>>> A = np.array([11, 12, 13, 17, 20, 22, 33, 34])
>>> B = np.array([5, 10, 15, 20, 25, 30])

Desired Result:

cond_a = relative_timestamp(to_transform=A, reference=B)
cond_a
>>> array([1, 2, 3, 2, 0, 2, 3, 4])
yatu
  • 86,083
  • 12
  • 84
  • 139
  • Hiya, sorry if I was not clear. The goal is to calculate the minimum distance between each event A and events in B smaller than that event. – Ruairi O'Sullivan May 07 '19 at 14:41
  • 1
    Hey Rafael. In reality A and B contain hundred of thousands of entries. I am also referencing many different arrays to B. I don't know much about computation times but I think a for loop might be too slow. – Ruairi O'Sullivan May 07 '19 at 15:01
  • You can use [np.searchsorted](https://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html): `idx=np.searchsorted(B,A, side='right')`; `result=A-B[idx-1]` – Brenlla May 07 '19 at 15:10
  • @Brenlla Any reason not to post this as an answer ? – rafaelc May 07 '19 at 15:23
  • @RafaelC No particular reason, but I had the feeling this must have been asked before. After search, I found couple answers [here](https://stackoverflow.com/a/34915542/6091318) and [here](https://stackoverflow.com/a/20780569/6091318) – Brenlla May 07 '19 at 15:30
  • @Brenlla Well, hard to close this question with any of these 2 as duplicates. SO I suggest you post it as an answer, specially for future referrals ;} – rafaelc May 07 '19 at 15:53

3 Answers3

2

You can use np.searchsorted to find the indices where the elements of A should be inserted in B to maintain order. In other words, you are finding the closest elemet in B for each element in A:

idx = np.searchsorted(B, A, side='right')
result = A-B[idx-1] # substract one for proper index

According to the docs searchsorted uses binary search, so it will scale fine for large inputs.

Brenlla
  • 1,471
  • 1
  • 11
  • 23
1

Here's an approach consisting on computing the pairwise differences. Note that it has a O(n**2) complexity so it might for larger arrays @brenlla's answer will perform much better.

The idea here is to use np.subtract.outer and then find the minimum difference along axis 1 over a masked array, where only values in B smaller than a are considered:

dif = np.abs(np.subtract.outer(A,B))
np.ma.array(dif, mask = A[:,None] < B).min(1).data
# array([1, 2, 3, 2, 0, 2, 3, 4])
yatu
  • 86,083
  • 12
  • 84
  • 139
  • This works fine, but the time (and memory) complexity of an `outer` operation is O(n**2), so this is only useful for small inputs – Brenlla May 07 '19 at 15:53
  • Yes agreed... You're answer with `np.searchsorted` is much better, hadn't thought about using it here. I'd suggest you post it as an answer and I'll remove mine or add a note warning about the complexity @brenlla – yatu May 07 '19 at 15:57
0

As I am not sure, if it is really faster to calculate all pairwise differences, instead of a python loop over each array entry (worst case O(Len(A)+len(B)), the solution with a loop:

A = np.array([11, 12, 13, 17, 20, 22, 33, 34])
B = np.array([5, 10, 15, 20, 25, 30])

def calculate_next_distance(to_transform, reference):
    max_reference = len(reference) - 1
    current_reference = 0
    transformed_values = np.zeros_like(to_transform)
    for i, value in enumerate(to_transform):
        while current_reference < max_reference and reference[current_reference+1] <= value:
            current_reference += 1
        transformed_values[i] = value - reference[current_reference]
    return transformed_values

calculate_next_distance(A,B)
# array([1, 2, 3, 2, 0, 2, 3, 4])
Sparky05
  • 4,692
  • 1
  • 10
  • 27