8

I have a medium-amount of base objects.

These base objects will be put in collections, and these collections will be munged around: sorted, truncated, etc.

Unfortunately, the n is large enough that memory consumption is slightly worrisome, and speed is getting concerning.

My understanding is that tuples are slightly more memory-efficient, since they are deduplicated.

Anyway, I would like to know what the cpu/memory tradeoffs of lists vs. tuples are in Python 2.6/2.7.

Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
Paul Nathan
  • 39,638
  • 28
  • 112
  • 212

7 Answers7

16

If you have a tuple and a list with the same elements, the tuple takes less space. Since tuples are immutable, you can't sort them, add to them, etc. I recommend watching this talk by Alex Gaynor for a quick intro on when to choose what datastructure in Python.

UPDATE: Thinking about it some more, you may want to look into optimizing the space usage of your objects, e.g., via __slots__ or using namedtuple instances as proxies instead of the actual objects. This would likely lead to much bigger savings, since you have N of them and (presumbaly) only a few collections in which they appear. namedtuple in particular is super awesome; check out Raymond Hettinger's talk.

Hank Gay
  • 70,339
  • 36
  • 160
  • 222
9

As others mentioned tuples are immutable. Sorting a tuple (e.g. sorted(mytuple)) returns a list, which you would then have to cast back to a tuple.

To sort a tuple (and keep it a tuple) you'd have to do this:

mytuple = (3,2,1)
mysortedtuple = tuple(sorted(mytuple))

To sort a list you'd have to do this:

mylist = [3,2,1]
mylist.sort()

Because you're not casting and re-casting, the latter, in this instance, is more efficient.

Don't get hung up on using tuples over lists unless you have a good justification. If you need sorted data, tuples are not the way to go unless they are created that way in the first place. Tuples excel when the data they contain DOES NOT CHANGE, such as with configuration settings that are loaded at run-time, or data that has already been processed.

Considering that you mentioned you are processing a large dataset, you might want to look at using a functional programming style by way of generators and iterators over lists and tuples. This way you're not shuttling around and creating new containers, but just chaining iteration operations to get to the end result.

Further reading:

jathanism
  • 33,067
  • 9
  • 68
  • 86
  • I am munging around large collections of objects. The objects themselves will change. For the most part, the collections are being filtered and mapped. A number of these collections are likely to contain the same objects (references to objects, really). I am considering using *tuples* as a mechanism to minimize memory usage. – Paul Nathan May 19 '11 at 19:12
  • I'm with you! Taking you completely literally here: What I am suggesting is that instead of doing `filter` and `map` (which return lists), use the itertools equivalent of each: `itertools.ifilter` and `itertools.imap`, which return iterators. If you do this all the way down the line, then you can make the end result be a tuple. Even the initial collection that is being filtered/mapped can and should be an iterator instead of a list/tuple if at all possible. This way you're only generating a new collection once you have what you really want. – jathanism May 19 '11 at 19:23
  • @PaulNathan in your comment above, did you mean "The objects themselves will **not** change"? Was reading this exchange about immutable objects, and thought I was following it, until that sentence threw me. Thanks all for this very educational discussion! – dampier Aug 13 '13 at 05:33
4

What is the (average, min, max) number of base objects in a collection?

Tuples are "deduplicated" and lists are not? What do you think that "deduplicated" means in this context?

Lists do take up more memory than tuples, because extra memory is allocated on the presumption that a list is going to grow and you definitely don't want to realloc() memory each time you do large_list.append(). However on a 32-bit machine, the amortised cost of an extra list element is 4 bytes for a pointer, N bytes for the element itself, and no more than another 4 bytes for the extra memory. N is 16 bytes for a float. That means a list of floats takes up to 24 bytes per extra float, compared with 20 bytes for a tuple. A "base object" with N==100 gives a comparison of 108 versus 104. If a based object is referred to in two collections, then 58 versus 54. How big is your N?

Advice: Leave your collections as lists. Concentrate on:

  • ensuring your base objects are memory-efficient

  • use generators and itertools goodies instead of temporary lists where possible

  • if you can't avoid having temporary lists, ensure that they are thrown away immmediately they are not needed any more i.e. don't wait till the creating method returns; use explicit del as soon as possible.

John Machin
  • 81,303
  • 11
  • 141
  • 189
3

In addition to all these suggestions, you may find that numpy will fill your needs. If your objects are something that numpy handles by default (ints, native C types, etc) then that would be ideal. You can use a numpy array with custom objects as well, but that might be more work than it's worth.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Falmarri
  • 47,727
  • 41
  • 151
  • 191
2

You can't use them the same way. Tuples are immutable and don't support appending, sorting, etc (calling sorted on a tuple yields a list, and so on). Tuples are totally different from lists, so any performance comparison is meaningless.

Rafe Kettler
  • 75,757
  • 21
  • 156
  • 151
  • let's imagine that they are both being used as containers with an ordering of indices for the nonce, ok? – Paul Nathan May 19 '11 at 18:57
  • @Paul I don't get what you're saying. How would that affect the fact that tuples cannot be manipulated the way you want to. – Rafe Kettler May 19 '11 at 18:58
  • Let's assume that algorithms can be rewritten to cope with immutability. With that assumption, let us proceed to analyzing Python lists vs. tuples for efficiency. – Paul Nathan May 19 '11 at 19:00
  • 2
    @Paul in that case, there's no comparison. It'd be like comparing a set to a dict; doesn't make sense. I can, however, say that the tuple was not intended to be used this way. – Rafe Kettler May 19 '11 at 19:02
  • @Rafe: They are both sets with an ordering of elements... It makes a ton of sense to compare them. I don't understand your objections in the *least*. – Paul Nathan May 19 '11 at 19:05
  • 1
    @Paul tuples are not ordered, they are structured. They don't behave at all like lists, and are not used to accomplish the same things. You need to be seeking a way to store your data out-of-memory rather than trying to use a data structure that won't do what you need it to, – Rafe Kettler May 19 '11 at 19:07
1

You cannot sort an immutable object - i.e. when sorting a tuple you'll always create a new one.

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
1

There are at least two existing questions that are similar enough to yours that the answers (or links within them) may be useful to you. To summarize: let the features of the type (mutable vs. immutable, heterogeneous vs. homogeneous) rather than performance guide your decision, because the performance/efficiency differences are minimal.

What's the difference between list and tuples in Python?
What are differences between List, Dictionary and Tuple in Python?

Community
  • 1
  • 1
ajk
  • 4,473
  • 2
  • 19
  • 24