Best practice to write sequence constants as tuple? And as set when testing for inclusion?

Question

If we have small sequence constants in our code, is it best practice to write them as tuple rather than as a list? Or does it not really matter?

e.g. ORDER_TYPES = (1, 2, 3) vs ORDER_TYPES = [1, 2, 3]

And when we intend to use inclusion operations against that sequence, does it make sense to store as a set?

e.g.

ORDER_TYPES = {1, 2, 3}
order_type = 1
if order_type in ORDER_TYPES:
    ...

Are there any actual speed/space benefits to storing as tuple vs list, or as a set, when the sequences are small like this? I know this falls into the realm of over-optimizing but just wondering what is actually considered best practice.

In one word: benchmark. This depends on so many things, including the exact version of Python you are using, so the best thing to do is to profile each approach — DeepSpace, Jun 28 '22 at 11:54
When performance is not of concern (like in your example), the best practice is to **use the data structure that is closest to what you want to represent** with your code. It sounds like `ORDER_TYPES` is a collection of unique, non-ordered elements, so I would go with set. The fact that set membership checking is `O(1)` compared to list/tuple being `O(n)` is just a nice bonus, but your main focus should be on how to properly model the real-life stuff through your code. — jfaccioni, Jun 28 '22 at 11:57
@jfaccioni TBH, it sounds like `ORDER_TYPES` should really be an `Enum` (which may or may not be applicable to what OP is asking about, but they did mention "best-practices") — DeepSpace, Jun 28 '22 at 12:02

score 2 · Answer 1 · answered Jun 28 '22 at 11:49

Using tuples is recommended,

Actually in your case there not much difference as they are not too many and they are not being used a lot.

but when you mentioned best practice in your question although you can(allowed) to use any data structure evens strings but there is no difference between your question and general best practice cases.

so take a close look at This link or other similar links.

Also Here explains with more details.

Cardstdani · Answer 2 · 2022-06-28T11:52:56.670

-1

The main difference between lists and sets are the efficiency mainly in operations like in. For example:

l = [1, 2, 3]
s = {1, 2, 3}

print(1 in l, 1 in s)

In this case, using a set is much more efficient than a list because sets work with hash maps on the low level, so it will have O(1) time complexity for checking if an element is contained in a set. Meanwhile, lists and tuples take O(n) complexity, which is much more expensive. Also, using set allows you to insert elements in O(1) complexity too.

You can learn more about both structures with this resource.

edited Jun 28 '22 at 11:52

answered Jun 28 '22 at 11:49

Cardstdani

4,999
3
12
31

2

This is very questionable when using such a small dataset. A set requires hashing when creating the set **and** every time `in` is used. It is not guaranteed at all that it will be faster than using a tuple on all systems in every usecase. – DeepSpace Jun 28 '22 at 11:51
@deepspace But it's scalable when there are a lot of orders and data – Cardstdani Jun 28 '22 at 11:52
2

OP specifically asks about small datasets, *"when the sequences are small like this"* – DeepSpace Jun 28 '22 at 11:52
@deepspace But he may encounter a situation where he must upscale his code efficiency – Cardstdani Jun 28 '22 at 11:53
2

But he does not ask about that situation... – DeepSpace Jun 28 '22 at 11:55
1

@Cardstdani to be fair to any potential downvoters, the code in your answer just demonstrates that both lists and sets support the `in` operator, not how it actually differs between both data structures (efficiency-wise). – jfaccioni Jun 28 '22 at 12:03
2

Also "using a set is much more efficient" is too strong of a statement, especially since it isn't true for one-off collections like the ones the OP is asking about. Creating a set (even when optimized to a `LOAD_CONST`) is a very expensive operation. You need to use a set many times before that cost pays off. Back-of-the-envelope benchmarks using `timeit` indicate that on my system sets perform _worse_ than tuples if you're using `in` four times or fewer. – Brian61354270 Jun 28 '22 at 12:29

Best practice to write sequence constants as tuple? And as set when testing for inclusion?

2 Answers2