The fastest way to do value selection?

Question

I have a 2d list comprehension which sets either 1 or 0, depending on the first occouring condition.

Since it's relatively slow, I wonder whether there is a NumPy function or a library to speed this up to a more efficient manner.

Note: the subarrays only equal length at the same index.

result      = [ 
[1 if (ratUp >ratDown)  else 0 if (ratDown>ratUp) else  0 if (pointsDown>pointsUp) else 1    
               for ratUp,ratDown,pointsUp,pointsDown  
                           in zip(ratiosUpSlice,ratiosDownSlice,upPointsSlice,downPointsSlice)] 
                                         for ratiosUpSlice,ratiosDownSlice,upPointsSlice,downPointsSlice 
                                                    in zip(RatiosUp, RatiosDown, UpPointsSlices, DownPointsSlices)]

Reproducable:

import numpy as np
LEN = 10000
temp = np.random.randint(1,high=100, size=LEN) 
RatiosUp         = [np.random.uniform(size=rand) for rand in temp]
RatiosDown       = [np.random.uniform(size=rand) for rand in temp]
UpPointsSlices   = [np.random.uniform(size=rand) for rand in temp]
DownPointsSlices = [np.random.uniform(size=rand) for rand in temp]

List comprehensions are not always faster. Try to rewrite your code with ordinary `if` and `for` statements, it will also make it easier to comprehend. As of now, you seem to perform too many actions per iteration. — go2nirvana, Sep 15 '20 at 08:24
Please put some parenthes in your conditional. I don't think it really makes sense, or at least you have a bunch of redundancies. — Mad Physicist, Sep 15 '20 at 14:24
@mkrieger is the reproducable sample what you mean by input? if so, then please run both in your console. this should create an output in a relatively short time. (If this is not a proper answer to what you were asking, please excuse my misunderstanding and be so kind to clear it up for me) — La-Li-Lu-Le-Low, Sep 15 '20 at 21:39

Mad Physicist · Accepted Answer · 2020-09-17T19:35:49.820

2

You can modify the way you do the processing to do all the operations quickly in numpy, and then split the final result (if you really need to). There is nothing fundamentally 2D about your data: everything is done per-element.

Let's look at how you generate the input data first. You can generate all the data as arrays rather than lists:

import numpy as np

LEN = 10000
sizes = np.random.randint(1, 100, size=LEN)
n = sizes.sum()
ratios_up = np.random.uniform(size=n)
ratios_down = np.random.uniform(size=n)
up_point_slices = np.random.uniform(size=n)
down_point_slices = np.random.uniform(size=n)

It should be pretty easy to visualize the loop as a single numpy operation now:

result = (ratios_up > ratios_down) | ((ratios_up == ratios_down) & (points_up >= points_down))

If you need the result split into arrays:

result = np.split(result, np.cumsum(sizes[:-1]))

If you are committed to the split, you can write the entire operation even more concisely:

splits = np.cumsum(np.random.randint(1, 100, size=LEN))
up = np.random.uniform(size=(splits[-1], 2))
down = np.random.uniform(size=(splits[-1], 2))

result = np.split((up > down).any(1), splits[:-1])

edited Sep 17 '20 at 19:35

answered Sep 15 '20 at 14:32

Mad Physicist

107,652
25
181
264

Thanks for the answer. I am not sure if the loop vizualization results in the same effect as the "if else if else" statement in the sample (because the order matters (if A -> 1 if B -> 0 -> if still none of both, check if C -> if C -> 0, if none of those occoured -> just set 1.) So to find out if the result fits I tried to compare both outputs (by using the same "sizes" array) and somehow your code created 1 sample more than the sample code above. How does that come? – La-Li-Lu-Le-Low Sep 16 '20 at 04:37
could u show how to compare it via all(np.allclose(a,b,equal_nan=True) for a,b in zip(res1,res2)) ? (because i get different results when i leave the final or the first sample – La-Li-Lu-Le-Low Sep 16 '20 at 04:40
@La-Li-Lu-Le-Low. Your conditions are closely related. If `ratUp != ratDown` then `ratUp > ratDown` is the same as `not ratDown > ratUp`. So the first two conditions are really just`ratUp > ratDown`. The only way the last part can kick in is if `ratUp == ratDown`. This will almost never happen for floats with 53 bits of precision. But either way, the last condition is just `not pointsDown>pointsUp`, i.e., `pointsUp>=pointsDown` – Mad Physicist Sep 16 '20 at 04:49
@La-Li-Lu-Le-Low. If you are willing to accept the approximation that `ratUp != ratDown`, drop the part after `|`. Otherwise, I've fixed the condition – Mad Physicist Sep 16 '20 at 04:51
I'll debug tomorrow – Mad Physicist Sep 16 '20 at 04:54
@La-Li-Lu-Le-Low. I always ended up with one extra sample because the indices to `split` should not include the trailing index, but cumsum does. I've fixed the issue. – Mad Physicist Sep 17 '20 at 19:36

The fastest way to do value selection?

1 Answers1