What is the most efficient way to the determine items of a list which are greater than a threshold?

Question

Basically, I am wondering what is the most efficient method to find the elements of a python list with a value of greater than, say, n.

I believe, the easiest, yet not so efficient, way is as below,

for i in range(len(theList)):
    if theList[i] > n:
        subList.append(theList[i])

Moreover, we have the single line for as below,

(subList for subList in theList if sublist > n)

(Please correct me if there is anything wrong with the above syntax)

Finally, we can use filter() function, which is not pleasant to use, at least for me.

The above methods were all the ways that I know. If you know any better method please tell me. Otherwise, please explain which one is the best, in the sense of efficiency and run-time.

If you are just trying to return a list with only elements where the value is greater than `n`, then your comprehension is a good way to go. If you mean to return a list rather than a generator, you can modify it as `[elem for elem in elems if elem > n]` — benvc, Mar 16 '19 at 01:32
@benvc I am asking about a comparison of run-times. The mentioned question is just pointing out the method. — M.Hossein Rahimi, Mar 16 '19 at 01:48

j-i-l · Accepted Answer · 2019-03-16T02:16:30.703

2

There is no always right answer to this and there have been a few SO posts about the speed of different approaches when handling list, see e.g. here, here or here.

What is the fastest way might depend a lot on your list. This said, let's just have a look at how fast the suggested approaches are.

For simple comparisons like this you can use timeit:

1. Case: The for-loop

for_case = """newList=[]
for x in theList:
    if x > n:
            newList.append(x)"""

2. Case: List comprehension

list_comp = '[x for x in theList if x > n]'

3. Case: The filter (somehow unliked)

filtering = 'list(filter(lambda x: x > n, theList))'

Some preparation:

import timeit
si = 'theList=range(2000);n=1000;'  # using list(range(2000)) has no effect on the ranking

So let's see:

timeit.timeit(si+list_comp, number=10000)
Out[21]: 1.3985847820003983
timeit.timeit(si+filtering, number=10000)
Out[22]: 3.315784254024038
timeit.timeit(si+for_case, number=10000)
Out[23]: 2.0093530920275953

So, at least on my machine, the list comprehension takes it away, followed by the for-loop and, at least in this case the unliked filter is indeed the slowest.

edited Mar 16 '19 at 02:16

answered Mar 16 '19 at 01:59

j-i-l

10,281
3
53
70

Thank you. I believe this is the ultimate answer :) The only question left to be settled now is if there are any other ways to do so? – M.Hossein Rahimi Mar 16 '19 at 02:03
1

@HosseinRahimi You are welcome. There are a few question about speed here on SO, e.g. [here](https://stackoverflow.com/questions/3013449/list-comprehension-vs-lambda-filter), [here](https://stackoverflow.com/questions/1247486/list-comprehension-vs-map) or [here](https://stackoverflow.com/questions/1632902/lambda-versus-list-comprehension-performance). All are a good read if you are interested in this subject! – j-i-l Mar 16 '19 at 02:07
How this answer is different from the one posted already? is there a need for for loop ? or filter? – Ijaz Ahmad Mar 16 '19 at 02:08
1

@HosseinRahimi there are certainly other ways, but when it comes to the fastest ones, then those 3 are about it I think. – j-i-l Mar 16 '19 at 02:13

Ijaz Ahmad · Answer 2 · 2019-03-16T02:10:21.877

-1

list comprehension version:

sublist = [ i for i in the_list if i > n ]

Generator expression: ( if the list is huge size)

sublist = ( i for i in the_list if i > n )

edited Mar 16 '19 at 02:10

answered Mar 16 '19 at 01:59

Ijaz Ahmad

11,198
9
53
73

What is the most efficient way to the determine items of a list which are greater than a threshold?

2 Answers2