Speed test among set, list and tuple in Python gives surprising results

Question

I tried to test the speed among set,list and tuple and got surprising results.

Before this, I have known that set is faster than list based on this answer.

Here is my test code:

import timeit,time
from sys import getsizeof as Size

List_Test = [range(1000)]
print("The Size of List is : {}".format(Size(List_Test)))
Set_Test = set(range(1000))
print("The Size of Set is : {}".format(Size(Set_Test)))
Tuple_Test = tuple(range(1000))
print("The Size of Tuple is : {}".format(Size(Tuple_Test)))


print("\nNow is to test speed\n")
time.sleep(3)

def Create_List():
    List = [i for i in range(1000)]

def Test_List():
    for i in List_Test:
        if i == 6:
            break

def Create_Set():
    Set = set(i for i in range(1000))

def Test_Set():
    for i in Set_Test:
        if i == 6:
            break

def Create_Tuple():
    Tuple = tuple(i for i in range(1000))

def Test_Tuple():
    for i in Tuple_Test:
        if i == 6:
            break

t = timeit.repeat(stmt="Create_List()",number=1000,setup="from __main__ import Create_List", repeat=30)
print("The Time of Create_List : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Tuple()",number=1000,setup="from __main__ import Create_Tuple", repeat=30)
print("The Time of Create_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Set()",number=1000,setup="from __main__ import Create_Set", repeat=30)
print("The Time of Create_Set : {}".format(sum(t)/len(t)))

print("\n")

t = timeit.repeat(stmt="Test_List()",number=1000,setup="from __main__ import Test_List", repeat=30)
print("The Time of Test_List : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Tuple()",number=1000,setup="from __main__ import Test_Tuple", repeat=30)
print("The Time of Test_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Set()",number=1000,setup="from __main__ import Test_Set", repeat=30)
print("The Time of Test_Set : {}".format(sum(t)/len(t)))

print("\nThe end")

Finally, I found that:

Size: Set > Tuple > List
Speed: List > Tuple > Set

I think my test code is wrong. What's wrong with it?

Edit:

I changed the test code:

List_Test = list(range(500000))
......

def Test_List():
    randomNumber = random.randint(0,500000)
    for i in List_Test:
        if i == randomNumber:
            break

# Other test code is the same

And the result of the test always is list ≈ tuple > set.

When change the number 500000(x20) to 10000000,

It sometimes is list ≈ tuple ≈ set,but often list ≈ tuple > set.

May I infer that only when all of them have the same length(And the number of length is large),we can use set (Although its size is much larger than tuple and list)?

Variable and function names should follow the `lower_case_with_underscores` style. — AMC, Mar 18 '20 at 19:10

ggorlen · Accepted Answer · 2020-03-18T20:23:52.357

There are many problems with this test suite.

[range(1000)] isn't equivalent to the other two declarations. This makes a list with one element, and that single element points to the range. getsizeof is not recursive, so it only gives the size of the outer object. This can be illustrated by creating lists of a single element of different range sizes and noticing that the getsizeof call always gives the same result.

If you use list(range(1000)), you'll get a reasonable result that's about the same size as a tuple. I'm not sure what information is gained in making these lists 1000, though--why not compare sizes of the three empty elements? Even here, this is basically an implementation-dependent test that doesn't really have much bearing on how you might write a Python program, as far as I can tell. Sure, set is a bit larger as you'd expect (hash-based structures generally have more overhead than lists) and that varies from version to version.

As for the "create" test, consider set(i for i in range(1000)). In no way does this test the time it takes to create a set because most of the time is spent creating and running the generator that's being passed as a parameter. As with the last test, I'm not sure what this proves even if the test were fair (i.e. you used the list() initializer instead of a list comprehension). As with the "size" tests, you can just call the initializers with empty lists, but all this shows is that creation times are practically the same: there's some function call and object allocation overhead which is implementation-specific.

Lastly, the speed tests for lookup operations:

for i in Set_Test:
    if i == 6:
        break

This hardcodes a best-case scenario. List and tuple lookups perform a linear search, so the target is always found in 7 comparisons. This is going to be nearly identical to a set lookup, which is O(1) but requires some complicated hashing operations that add overhead. This test should use random indices where the distribution means the lists and tuples will be hit with worst-case scenarios regularly. The set should outperform lists and tuples done correctly.

Furthermore, a statement like "a set is faster than a list" has almost no meaning--data structures can't be compared like this, and a word like "faster" speaks to specific run-time conditions that are highly variable and case-specific. Comparing the data structures requires comparison of their operations' time complexities which help describe their fitness for some purpose.

To summarize: the first two tests aren't worth performing even if written correctly and the last test will show that set lookups are O(1) while list/tuple lookups are O(n) if they're made fair using random target items.

I close my computer, I will try it tomorrow. And give you the feedback.Thanks — jizhihaoSAMA, Mar 18 '20 at 18:04
I use single variable method in the "create" test.(So in all of the test,I use iterator.I use it to know which is the fastest way among them when create them.) — jizhihaoSAMA, Mar 19 '20 at 02:11
`for i in Set_Test:` is wrong because you're performing a linear search on the set. The correct way to look up an item in a set is `item in some_set`. Same with list and any other structure. Each check should be `return item in set` basically. — ggorlen, Mar 19 '20 at 02:42
Well,I get the right answer.Thanks! And Is my ``create`` test fair? — jizhihaoSAMA, Mar 19 '20 at 02:47
Yes, but as my post says, there's no point in benchmarking the creation. They're all basically the same and implementation-dependent, so the only thing worth comparing are lookups, insertions and other algorithms that the data structures offer. For example, you could compare removing random items from a list against random items from a set. Run your tests a few hundred thousand times using random numbers. Which do you think will be faster? — ggorlen, Mar 19 '20 at 02:52
Oh,``set`` is fast.okay,I learn a lot.Thanks you,kind stranger. — jizhihaoSAMA, Mar 19 '20 at 02:58
No problem. Thanks for your patience. Just to nitpick, please try to avoid saying "x is fast". Saying "the time complexity of lookups for sets are, on average, O(1)". Speed and time complexity are two totally different things. If your list is 5 elements, I bet the list lookups will be faster, for example. Set lookups are slower than list lookups in terms of constant-time overhead, but there's a point where `n` grows large enough that the set will outperform the list. You might try figuring out where this is for some test case. — ggorlen, Mar 19 '20 at 03:00

score 2 · Answer 2 · answered Mar 18 '20 at 17:46

2

List_Test = [range(1000)] creates a one-item list that contains a range object without actually making the range generator produce any output.

You should use the list constructor instead to actually create a list with the generator output of the range object unpacked:

List_Test = list(range(1000))

answered Mar 18 '20 at 17:46

blhsing

91,368
6
71
106

1

I close my computer,I will try it tomorrow.And give you the feedback. – jizhihaoSAMA Mar 18 '20 at 18:02

score 0 · Answer 3 · answered Mar 18 '20 at 17:48

Maybe try:

#!/usr/local/cpython-3.8/bin/python3

import timeit,time
from sys import getsizeof as Size

List_Test = list(range(1000))
print("The Size of List is  : {}".format(Size(List_Test)))
Set_Test = set(range(1000))
print("The Size of Set is   : {}".format(Size(Set_Test)))
Tuple_Test = tuple(range(1000))
print("The Size of Tuple is : {}".format(Size(Tuple_Test)))


print("\nNow is to test speed\n")
time.sleep(3)

def Create_List():
    List = [i for i in range(1000)]

def Test_List():
    if 500 in List_Test:
        pass

def Create_Set():
    Set = set(i for i in range(1000))

def Test_Set():
    if 500 in Set_Test:
        pass

def Create_Tuple():
    Tuple = tuple(i for i in range(1000))

def Test_Tuple():
    if 500 in Tuple_Test:
        pass

t = timeit.repeat(stmt="Create_List()",number=1000,setup="from __main__ import Create_List", repeat=30)
print("The Time of Create_List  : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Tuple()",number=1000,setup="from __main__ import Create_Tuple", repeat=30)
print("The Time of Create_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Create_Set()",number=1000,setup="from __main__ import Create_Set", repeat=30)
print("The Time of Create_Set   : {}".format(sum(t)/len(t)))

print("\n")

t = timeit.repeat(stmt="Test_List()",number=1000,setup="from __main__ import Test_List", repeat=30)
print("The Time of Test_List  : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Tuple()",number=1000,setup="from __main__ import Test_Tuple", repeat=30)
print("The Time of Test_Tuple : {}".format(sum(t)/len(t)))
t = timeit.repeat(stmt="Test_Set()",number=1000,setup="from __main__ import Test_Set", repeat=30)
print("The Time of Test_Set   : {}".format(sum(t)/len(t)))

print("\nThe end")

Speed test among set, list and tuple in Python gives surprising results

3 Answers3