how to implement this array algorithm in a more efficient way?

Question

Assuming I have n = 3 lists of same length for example:

R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

I need to find the first index t that verifies the n = 3 following conditions for a period p = 2. Edit: the meaning of period p is the number of consecutive "boxes".

R1[t] >= 5, R1[t+1] >= 5. Here t +p -1 = t+1, we need to only verify for two boxes t and t+1. If p was equal to 3 we will need to verify for t, t+1 and t+2. Note that It's always the same number for which we test, we always test if it's greater than 5 for every index. The condition is always the same for all the "boxes".
R2[t] >= 2, R2[t+1] >= 2
R3[t] >= 9, R3[t+1] >= 9

In total there is 3 * p conditions.

Here the t I am looking for is 5 (indexing is starting from 0).

The basic way to do this is by looping on all the indexes using a for loop. If the condition is found for some index t we store it in some local variable temp and we verify the conditions still hold for every element whose index is between t+1 and t+p -1. If while checking, we find an index that does not satisfy a condition, we forget about the temp and we keep going.

What is the most efficient way to do this in Python if I have large lists (like of 10000 elements)? Is there a more efficient way than the for loop?

Show an example and mcve. You're looking for more efficient, but I don't see a first order implementation — Mad Physicist, Jul 05 '20 at 23:25

paime · Answer 1 · 2020-07-06T09:29:49.200

Since all your conditions are the same (>=), we could leverage this.

This solution will work for any number of conditions and any size of analysis window, and no for loop is used.

You have an array:

>>> R = np.array([R1, R2, R3]).T                                                                                                                                                                         
>>> R
array([[7, 8, 1],
       [5, 0, 7],
       [8, 2, 5],
       [6, 2, 9],
       [0, 0, 0],
       [6, 2, 9],
       [7, 2, 9]]

and you have thresholds:

>>> thresholds = [5, 2, 9]

So you can check where the conditions are met:

>>> R >= thresholds
array([[ True,  True, False],
       [ True, False, False],
       [ True,  True, False],
       [ True,  True,  True],
       [False, False, False],
       [ True,  True,  True],
       [ True,  True,  True]])

And where they all met at the same time:

>>> R_cond = np.all(R >= thresholds, axis=1)
>>> R_cond
array([False, False, False,  True, False,  True,  True])

From there, you want the conditions to be met for a given window.

We'll use the fact that booleans can sum together, and convolution to apply the window:

>>> win_size = 2
>>> R_conv = np.convolve(R_cond, np.ones(win_size), mode="valid")
>>> R_conv
array([0., 0., 1., 1., 1., 2.])

The resulting array will have values equal to win_size at the indices where all conditions are met on the window range.

So let's retrieve the first of those indices:

>>> index = np.where(R_conv == win_size)[0][0]
>>> index
5

If such an index doesn't exist, it will raise an IndexError, I'm letting you handle that.

So, as a one-liner function, it gives:

def idx_conditions(arr, thresholds, win_size, condition):
    return np.where(
        np.convolve(
            np.all(condition(arr, thresholds), axis=1),
            np.ones(win_size),
            mode="valid"
        )
        == win_size
    )[0][0]

I added the condition as an argument to the function, to be more general.

>>> from operator import ge
>>> idx_conditions(R, thresholds, win_size, ge)
5

Hi @paime and welcome! Thanks for the detailed answer, it's awesome :) I have a question, what if I need to pass some `minimal_index` to the function `idx_conditions` and I will look for the `first_index` greater than the `minimal_index` such that the conditions are satisfied. I thinking about a while loop but something needs to vary from one iteration to another. — Joffrey L., Jul 06 '20 at 03:32
Hi @JoffreyL., you're welcome. To implement a `minimal_index`, just add it as an optional parameter : `def idx_conditions(..., minimal_index=0)`, then add `arr = arr[minimal_index:]` as the first line of the function, and eventually add `... + minimal_index` to the returned value. — paime, Jul 06 '20 at 09:38
Hi @paime, thank you :) Assume that the method couldn't found the index, how to make it return the index of the minimum possible time window for which the conditions are verified? — Best_fit, Jul 07 '20 at 16:28
@JoffreyL. You can make recursive call. Just try/catch when you get the index: `try: np.where(...)[0][0] except IndexError: return idx_conditions(... win_size-1) if win_size > 1 else False`. Or make it a `while` instead of recursive, to avoid re-evaluation of the `np.all()` line. Also, if you do it like this it means that `minimal_index` is stronger than the `win_size`. So this function will begin to be tricky, you may want to split it. — paime, Jul 07 '20 at 16:38
@paime Any hint on how to split it? The problem with the idea is that it's not sufficient if the win_size = 100 but the only possible window is for 2 periods, you will have tens of "useless" iterations — Best_fit, Jul 07 '20 at 18:44
Using a while around the convolution will be faster if `win_size` is not too far for the actual window size that will be found. If you are gonna use it with innapropriate `win_size `, you can use [this `O(n)` solution](https://stackoverflow.com/a/49693931/13636407) to search over `R_cond`, by adding a `curr_found_max` and a `curr_index_max`. See [this pastebin](https://pastebin.com/L0ymuR3z) for a possible implementation. — paime, Jul 07 '20 at 21:37

MrNobody33 · Answer 2 · 2020-07-05T23:04:04.600

1

This could be a way:

R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
    if (R1[i]>=5 and R1[inext]>=5)&(R2[i]>=2 and R2[inext]>=2)&(R3[i]>=9 and R3[inext]>=9):
        print(i)

Output:

Edit: Generalization could be:

def foo(ls,conditions):
    index=0
    for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
        if all((ls[j][i]>=conditions[j] and ls[j][inext]>=conditions[j])  for j in range(len(ls))):
            index=i
    return index


R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

R4 = [1,7,5,9,0,1,1]

R5 = [1,7,5,9,0,3,3]



conditions=[5,2,9,1,3]
ls=[R1,R2,R3,R4,R5]

print(foo(ls,conditions))

Output:

And, maybe if the arrays match the conditions multiple times, you could return a list of the indexes:

def foo(ls,conditions):
    index=[]
    for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
        if all((ls[j][i]>=conditions[j] and ls[j][inext]>=conditions[j])  for j in range(len(ls))):
            print(i)
            index.append(i)
    return index


R1 = [6,7,8,6,0,6,7]

R2 = [2,2,2,2,0,2,2]

R3 = [9,9,5,9,0,9,9]

R4 = [1,1,5,9,0,1,1]

R5 = [3,3,5,9,0,3,3]

conditions=[5,2,9,1,3]
ls=[R1,R2,R3,R4,R5]

print(foo(ls,conditions))

Output:

[0,5]

edited Jul 05 '20 at 23:04

answered Jul 05 '20 at 22:31

MrNobody33

6,413
7
19

Thanks :), I will test it against the basic version I described. How could we generalize it if there is `N` resources instead of `3` with `N` as input? – Best_fit Jul 05 '20 at 22:33
With N conditions too, i guess, yes? – MrNobody33 Jul 05 '20 at 22:35
yes but this *N* is not known in advance. I mean I cannot write `if (R1[i]>=a and R1[inext]>=a)&(R2[i]>=b and R2[inext]>=b)&(R3[i]>=c and R3[inext]>=c) ... (RN[i]>=c and RN[inext]>=c)` Those `...`does not exist. *N* could be 3, 6, 1, whatever. – Best_fit Jul 05 '20 at 22:37
your method seems to be the most efficient, I am trying to find a way to generalize it – Best_fit Jul 05 '20 at 22:51
thank you :) Do you have an idea why your solution is more efficient than both mine and the one proposed by @asalimih? I assume this question needs some under-the-hood knowledge of python. – Best_fit Jul 05 '20 at 23:20
I don't know how your solution is. I'm not sure why this happens to be honest because I don't know how the functions that @asalimih used are built. See this [link](https://stackoverflow.com/questions/31598677/why-list-comprehension-is-much-faster-than-numpy-for-multiplying-arrays): it shows how the creation of multidimensional arrays is slower than a simple list comprehension. Maybe that could be a reason for why this happens. For my solution I guess it's faster because of the list comprehension when using `all`. List comprehension are faster in most cases. – MrNobody33 Jul 06 '20 at 02:57
You also could see this helpful links: [Python Performance Tips: Optimizing Loops](https://nyu-cds.github.io/python-performance-tips/08-loops/#:~:text=List%20comprehension%20is%20faster%20because,than%20equivalent%20use%20of%20map%20.), [What are the advantages of NumPy over regular Python lists?](https://stackoverflow.com/questions/993984/what-are-the-advantages-of-numpy-over-regular-python-lists), [Python Lists vs. Numpy Arrays - What is the difference?](https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference). – MrNobody33 Jul 06 '20 at 02:58
For an exact explanation, maybe you could post another question about the performance of the solutions. – MrNobody33 Jul 06 '20 at 02:58

asalimih · Answer 3 · 2020-07-05T23:11:56.780

1

Here is a solution using numpy ,without for loops:

import numpy as np
R1 = np.array([7,5,8,6,0,6,7])
R2 = np.array([8,0,2,2,0,2,2])
R3 = np.array([1,7,5,9,0,9,9])
a = np.logical_and(np.logical_and(R1>=5,R2>=2),R3>=9)
np.where(np.logical_and(a[:-1],a[1:]))[0].item()

ouput

Edit:
Generalization
Say you have a list of lists R and a list of conditions c:

R = [[7,5,8,6,0,6,7],
     [8,0,2,2,0,2,2],
     [1,7,5,9,0,9,9]]
c = [5,2,9]

First we convert them to numpy arrays. the reshape(-1,1) converts c to a column matrix so that we can use pythons broadcasting feature in the >= operator

R = np.array(R)
c = np.array(c).reshape(-1,1)
R>=c

output:
array([[ True,  True,  True,  True, False,  True,  True],
       [ True, False,  True,  True, False,  True,  True],
       [False, False, False,  True, False,  True,  True]])

then we perform logical & operation between all rows using reduce function

a = np.logical_and.reduce(R>=c)
a
output:
array([False, False, False,  True, False,  True,  True])

next we create two arrays by removing first and last element of a and perform a logical & between them which shows which two subsequent elements satisfied the conditions in all lists:

np.logical_and(a[:-1],a[1:])
output:
array([False, False, False, False, False,  True])

now np.where just shows the index of the True element

np.where(np.logical_and(a[:-1],a[1:]))[0].item()
output:
5

edited Jul 05 '20 at 23:11

answered Jul 05 '20 at 22:38

asalimih

298
2
8

Thanks :), any idea how to generalize it for N lists with N as input (it could be 3, 6, 1, whatever)? – Best_fit Jul 05 '20 at 22:40
1

Generalization Added. – asalimih Jul 05 '20 at 22:46
Thanks for the detailed explanation :) Do you have an idea why your solution is more efficient than mine but less efficient than the one suggested by @MrNobody33. I assume this question needs some under-the-hood knowledge of python. – Best_fit Jul 05 '20 at 23:21
in terms of performance and run time my solution would be much better in comparison to `for` loop solutions in larger lists because of vectorized operations. – asalimih Jul 05 '20 at 23:25
Yup, I 've just verified it. How about the scaling not by list size but the number of lists? Does the same argument still hold? – Best_fit Jul 05 '20 at 23:38

how to implement this array algorithm in a more efficient way?

3 Answers3

Linked