1

I am currently working on a data set that needs to know the gap between a value on an array and the value below it then divide the result with current array value

For example :

result = 0
arr = [...]
for i in range(len(arr)):
    if i > 0:
        result += (arr[i] - arr[i - 1]) / arr[i]

But I would like to do this, in a "one-liner" scenario and preferably without any loops as to optimize performance, are there any ways to do this?


for example I am looking for something like :

arr = list(range(min, max))

is equivalent to :

arr = []
for i in range(max - min):
    arr.append(i + min)

My apologies if this is a noob-ish question, but I hope you can help me :)

gushkash
  • 23
  • 5

2 Answers2

2

This is a use-case for numpy (or generally anything supporting vectorized computation in python - maybe tensorflow would work too?)

import numpy as np
np_array = np.asarray(arr)
result = np.sum((np_array[1:]-np_array[:-1])/np_array[1:])

If the [1:] and such notation above is unclear, take a moment to learn about python slices.

EDIT: On the topic of speed - numpy (or any vectorized approach) is generally faster than the iterative approach. If for some reason speed is very important, cupy should be even faster, although this will need to use the machine's GPU.

Xavi
  • 370
  • 1
  • 7
1

np.diff is what you want:

import numpy as np
np_result = np.sum(np.diff(arr) / arr[1:])

np.testing.assert_almost_equal(np_result, result)

Using timeit and an initial array of

np.random.seed(1)
arr = np.random.randint(low=1, high=100, size=(10000,))

we get for the non-vectorized code:

3.59 ms ± 68.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

and for the vectorized code:

33.2 µs ± 922 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

EDIT: @Xavi's indexed, no diff solution is marginally faster:

29.7 µs ± 802 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

30 microseconds is 0.03 milliseconds, so it's a speedup of approx 100x, which is what we would expect for a trivially vectorizable operation.

mmdanziger
  • 4,466
  • 2
  • 31
  • 47
  • Thanks for benchmarking my solution too! I assume the slight overhead comes from how most standard numpy functions have some extra functionality that their implementation has to account for with some checks ([see here](https://github.com/numpy/numpy/blob/v1.23.0/numpy/lib/function_base.py#L1319-L1449)). That being said, it's better practice to use descriptive functions (such as diff) - I only avoided it for the sake of keeping it simple. – Xavi Jul 27 '22 at 09:42