1

I'm trying to beef up my "best practices," and I'm reading more about lists vs. tuples and memory allocation, and how you should use tuples if the list is not going to be changed as the program runs.

That being said, should you (almost) always convert from a list into a tuple if this is the case?

For example, let's say I have this code, and I'm looking at 100 colors input from users:

with open("colors.txt", "r") as file:
    lst = [line.strip() for line in file.readlines()]

I'm not planning on mutating the list. Does that mean I should follow with:

tup = tuple(lst)

and work off of tup?

I realize this is a pretty small example, and converting to a tuple only takes 1 line, but is this the best practice? It just feels a little clunky since it's new to me.

Thanks!

will-hedges
  • 1,254
  • 1
  • 9
  • 18
  • 1
    Why not just use a `tuple` to begin with? `color_tup = tuple(line.strip() for line in file.readlines())` – ddejohn Mar 27 '21 at 16:05
  • 2
    Or `tuple(map(str.strip, file))`. Either a list or a tuple will work fine. – khelwood Mar 27 '21 at 16:06
  • 1
    Good point @khelwood since it's only one operation, using `map` is definitely better. – ddejohn Mar 27 '21 at 16:07
  • 1
    OP, "best practice" in this case would be simply using a `tuple` to begin with, as described in the previous comments. That, and using better variable names than `lst` :D As far as using `list` over `tuple`, it really does depend on your use-case. Definitely use `tuple` in cases where you don't plan on modifying the contents. Or `set` if you don't care about order and want to enforce unique elements (and constant time membership checks!). – ddejohn Mar 27 '21 at 16:10
  • @blorgon and @khelwood thank you for the responses and explanations. I only used `lst` and `tup` for the example but I'll definitely look for ways to utilize tuples going forward. Would you mind posting as an answer so I can accept it for someone else who comes along? – will-hedges Mar 27 '21 at 16:20
  • 2
    The difference in memory consumption between a list of tuple with the same elements is unlikely to be significant on modern machines unless you are handing really big data. Semantically, using a tuple indicates that the position of an element in the collection is significant, for example in `('Jane', 'Smith')` 'Jane' is the given name, 'Smith' is the family name. In a list, the position is irrelevant: in `['red', 'blue', 'yellow']` all the elements are colours, being first or last in the list is of no importance. – snakecharmerb Mar 27 '21 at 16:23
  • @snakecharmerb I appreciate the explanation, I hadn't considered using tuples where I would be specifically using position. I'll have to remember that – will-hedges Mar 27 '21 at 16:27

3 Answers3

1

You can create a tuple directly instead of converting from a list; e.g.

tuple(map(str.strip, file))

If you want to allow your sequence to be changed, use a list. If you want to ensure the sequence cannot be changed, use a tuple. If it doesn't matter either way, you can use either one.

khelwood
  • 55,782
  • 14
  • 81
  • 108
1

I suppose I'll answer, as requested by OP.

First, "best practice" in cases where you do not plan to modify a collection generally means using a tuple, yes. You can skip the "clunkiness" of casting by simply avoiding creating a list in the first place:

with open("colors.txt", "r") as f:
    colors = tuple(map(str.strip, f))

credit to @khelwood

Or

tuple(some_expression for item in some_iterable)

For more complicated expressions.

As @snakecharmerb points out, using a tuple is also a good idea when the order of your elements is important. Since tuple objects are immutable, the order in which elements are placed in the tuple are "set in stone" if you will. This signifies to the reader that order is extremely important for your data.

For times when order is NOT important and you want to enforce unique elements, a set is ideal as they are still iterable but also have constant time membership checking, which can be helpful in some situations.

ddejohn
  • 8,775
  • 3
  • 17
  • 30
0

I do not agree that a tuple is better. Perhaps in older Python the answer is different however for Python 3.8.5 it depends on what 'better' means.

Take the following.

from sys import getsizeof
import numpy as np

a = np.array([i for i in range(10000000)])
b = [i for i in range(10000000)]
c = tuple([i for i in range(10000000)])

sum_a = np.sum(a)   # timeit 4.04 ms ± 71.2 µs per loop
sum_b = sum(b)      # timeit 199 ms ± 2.32 ms per loop
sum_c = sum(c)      # timeit 195 ms ± 2.72 ms per loop

print(f"numpy sum {sum_a} {getsizeof(a)}")
print(f"list  sum {sum_b} {getsizeof(b)}")
print(f"tuple sum {sum_c} {getsizeof(c)}")

The output is:

numpy sum    -2014260032 40000096
list  sum 49999995000000 81528048
tuple sum 49999995000000 80000040

For speed numpy is the winner, however the speed increase comes at the cost of fixed length math overflow and an incorrect answer. So take care.

For lists and tuples, the speed is often close to identical. The cost of creating a tuple from a list might have an impact depending on your use case however for speed, let's call them roughly equivalent.

So, numpy - take care, lists or tuples? Which is more Beautiful, more Explicit, Simpler, Flatter (see PEP20)? For my money the extra tuple(...) is less Zen.

I do not find the argument that a tuple is immutable compelling. Given the Python's general flexibility in typing use of caps indicating DONT_CHANGE_ME, feel_free_to_change as variable names is clearer. This suggestion comes from PEP8 https://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles

jwal
  • 630
  • 6
  • 16
  • `DONT_CHANGE_ME` is an *awful* variable name, and uppercase vs lowercase variable names as a way to signify what should and shouldn't be changed is a terrible idea and a silly convention I've never heard anybody ever suggest before. Uppercase variable names in Python are for globals. Saying something like "don't change me" in a variable name is also... lol. Like. You want to ENSURE that your thing can't be changed, not just hope other people will listen to your pleas. See [this](https://stackoverflow.com/a/22140115/6298712) post for more on why `tuple` is better than `list` in many ways. – ddejohn Mar 28 '21 at 02:57
  • PS—you're doing unnecessary work by creating a `list` first and then casting to `tuple` when you *could* just do `tuple(range(10000000))` which is twice as fast. FWIW though, you could also do `[*range(10000000)]` which is also about twice as fast in creation as a bog-standard comprehension. – ddejohn Mar 28 '21 at 02:59
  • Also kind of a silly point to make that "numpy is faster" if it gives you an unusable result. – ddejohn Mar 28 '21 at 03:01
  • blorgon - if you _understand_ this about numpy the use cases where the speed improvements are useful are _spectacular_. – jwal Mar 28 '21 at 03:52
  • Please enlighten me as to a case where *quickly doing something incorrect* is preferable to *slowly doing something correct*. – ddejohn Mar 28 '21 at 04:02
  • Lots of image processing functions benefit from fast finite word length math. – jwal Mar 28 '21 at 04:21
  • It seems to me like the best thing to do would be to use a comprehension to form a tuple *as well as* using all caps for the variable name, which is why I've accepted @blorgon 's answer. But since my question was regarding what to do with a list that you already have at some point in the runtime code, I can agree with jwal "converting" to a tuple may be un-pythonic – will-hedges Mar 28 '21 at 12:24