Elements in list greater than or equal to elements in other list (without for loop?)

Question

I have a list containing 1,000,000 elements (numbers) called x and I would like to count how many of them are equal to or above [0.5,0.55,0.60,...,1]. Is there a way to do it without a for loop?

Right now I have the following the code, which works for a specific value of the [0.5,...1] interval, let's say 0.5 and assigns it to the count variable

count=len([i for i in x if i >= 0.5])

EDIT: Basically what I want to avoid is doing this... if possible?

obs=[]
alpha = [0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,1]

for a in alpha:
    count= len([i for i in x if i >= a])
    obs.append(count)

Thanks in advance Best, Mikael

I do not think you can do this without a loop. Even if you tend to use any package that does this for you, it would still be traversing the list behind the scenes using some looping mechanism. — unixia, Nov 20 '19 at 18:14

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

I don't think it's possible without loop, but you can sort the array x and then you can use bisect module (doc) to locate insertion point (index).

For example:

x = [0.341, 0.423, 0.678, 0.999, 0.523, 0.751, 0.7]
    
alpha = [0.5,0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95,1]

x = sorted(x)

import bisect

obs = [len(x) - bisect.bisect_left(x, a) for a in alpha]

print(obs)

Will print:

[5, 4, 4, 4, 3, 2, 1, 1, 1, 1, 0]

Note:

sorted() has complexity n log(n) and bisect_left() log(n)

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 20 '19 at 18:43

Andrej Kesely

168,389
15
48
91

Let's say len(alpha) is m. This means we're looking at O((n + m) log n). OP's original is O(m * n). For m << n, OP wins, but for m ~= n, you win. Same applies to my answer, which is just shorter because of numpy. – Mad Physicist Nov 20 '19 at 18:58

score 0 · Answer 2 · answered Nov 20 '19 at 18:05

0

You can use numpy and boolean indexing:

>>> import numpy as np
>>> a = np.array(list(range(100)))
>>> a[a>=50].size
50

answered Nov 20 '19 at 18:05

Brad Campbell

2,969
2
23
21

Thanks for your reply. But my question is what I should do if I want to find the size of >=50, >=55, ... and so forth (where I have 50,55,... saved as a list) – Mikael Olai Milhøj Nov 20 '19 at 18:24

jdehesa · Answer 3 · 2019-11-21T09:52:18.530

EDIT: If you are using NumPy already, you can simply do this:

import numpy as np

# Make random data
np.random.seed(0)
x = np.random.binomial(n=20, p=0.5, size=1000000) / 20
bins = np.arange(0.55, 1.01, 0.05)
# One extra value for the upper bound of last bin
bins = np.append(bins, max(bins.max(), x.max()) + 1)
h, _ = np.histogram(x, bins)
result = np.cumsum(h)
print(result)
# [280645 354806 391658 406410 411048 412152 412356 412377 412378 412378]

If you are dealing with large arrays of numbers, you may considering using NumPy. But if you are using simple Python lists, you can do that for example like this:

def how_many_bigger(nums, mins):
    # List of counts for each minimum
    counts = [0] * len(mins)
    # For each number
    for n in nums:
        # For each minimum
        for i, m in enumerate(mins):
            # Add 1 to the count if the number is greater than the current minimum
            if n >= m:
                counts[i] += 1
    return counts

# Test
import random
# Make random data
random.seed(0)
nums = [random.random() for _ in range(1_000_000)]
# Make minimums
mins = [i / 100. for i in range(55, 101, 5)]
print(mins)
# [0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0]
count = how_many_bigger(nums, mins)
print(count)
# [449771, 399555, 349543, 299687, 249605, 199774, 149945, 99928, 49670, 0]

Hi, My x is given as ```x=np.random.binomial(n=20, p=0.5,size=1000000)``` — Mikael Olai Milhøj, Nov 20 '19 at 18:32
@MikaelOlaiMilhøj The values returned by that function are almost always greater than 1. — jdehesa, Nov 20 '19 at 19:06

score 0 · Answer 4 · answered Nov 20 '19 at 18:20

0

Even if you are not using for loop, internal methods use them. But iterates them efficiently.

you can use below function without for loop from your end.

x = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
l = list(filter(lambda _: _ > .5 , x))
print(l)

answered Nov 20 '19 at 18:20

Durai

505
4
12

Thanks. I have never seen the _ before. What is that? (sorry, new to python, as you may have figured out already!) – Mikael Olai Milhøj Nov 20 '19 at 18:22
@MikaelOlaiMilhøj `_` is a valid parameter name, but it is conventionally used for ignored parameters, it's not recommended if you are actually using its value (see [What is the purpose of the single underscore “_” variable in Python?](https://stackoverflow.com/q/5893163)). – jdehesa Nov 20 '19 at 18:35

Mad Physicist · Answer 5 · 2019-11-20T19:05:39.067

Based on comments, you're ok with using numpy, so use np.searchsorted to simply insert alpha into a sorted version of x. The indices will be your counts.

If you're ok with sorting x in-place:

x.sort()
counts = x.size - np.searchsorted(x, alpha)

If not,

counts = x.size - np.searchsorted(np.sort(x), alpha)

These counts assume that you want x < alpha. To get <= add the keyword side='right':

np.searchsorted(x, alpha, side='right')

PS

There are a couple of significant problems with the line

count = len([i for i in x if i >= 0.5])

First of all, you're creating a list of all the matching elements instead of just counting them. To count them do

count = sum(1 for i in x if i >= threshold)

Now the problem is that you are doing a linear pass through the entire array for each alpha, which is not necessary.

As I commented under @Andrej Kesely's answer, let's say we have N = len(x) and M = len(alpha). Your implementation is O(M * N) time complexity, while sorting gives you O((M + N) log N). For M << N (small alpha), your complexity is approximately O(N), which beats O(N log N). But for M ~= N, yours approaches O(N^2) vs my O(N log N).

Elements in list greater than or equal to elements in other list (without for loop?)

5 Answers5