9

[Python 3.1]

I'm following up on the design concept that tuples should be of known length (see this comment), and unknown length tuples should be replaced with lists in most circumstances. My question is under what circumstances should I deviate from that rule?

For example, I understand that tuples are faster to create from string and numeric literals than lists (see another comment). So, if I have performance-critical code where there are numerous calculations such as sumproduct(tuple1, tuple2), should I redefine them to work on lists despite a performance hit? (sumproduct((x, y, z), (a, b, c)) is defined as x * a + y * b + z * c, and its arguments have unspecified but equal lengths).

And what about the tuple that is automatically built by Python when using def f(*x)? I assume it's not something I should coerce to list every time I use it.

Btw, is (x, y, z) faster to create than [x, y, z] (for variables rather than literals)?

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355

4 Answers4

19

In my mind, the only interesting distinction between tuples and lists is that lists are mutable and tuples are not. The other distinctions that people mention seem completely artificial to me: tuples are like structs and lists are like arrays (this is where the "tuples should be a known length" comes from). But how is struct-ness aligned with immutability? It isn't.

The only distinction that matters is the distinction the language makes: mutability. If you need to modify the object, definitely use a list. If you need to hash the object (as a key in a dict, or an element of a set), then you need it to be immutable, so use a tuple. That's it.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
3

I always use the most the appropriate data structure for the job and do not really worry about if a tuple would save me half a millisecond here or there. Pre-obfuscating your code does not usually pay off in the end. If the code runs too slow you can always profile it later and change the .01% of code where it really matters.

All the things you are talking about are tied in to the implementation of the python version and the hardware it is running on. You can always time those things your self to see what they would be on your machine.

A common example of this is the 'old immutable strings are slow to concatenate' in python. This was true about 10 years ago, and then they changed the implementation in 2.4 or 2.5. If you do your own tests they now run faster than lists, but people are convinced of this still today and use silly constructs that actually ran slower!

nate c
  • 8,802
  • 2
  • 27
  • 28
  • Perhaps you should do some profiling before you say that others haven't. Try constructing tuples and lists from numeric and string literals in timeit and see what happens. Also, what's all this stuff about obfuscated python about? How is a tuple obfuscating anything? Seems perfectly clear to me. – aaronasterling Nov 18 '10 at 10:30
  • If your using tuples when the code would have been clearer with lists then it is obfuscating code. A tuple in itself may be clear to anyone, but it is piece of a greater code block. Read Ned Batchelder answer. If you need one use it. If you need the other use that. Both a tuple and a list are constructed in 'constant time' O(1). So is it worth compromising design for a little speed? – nate c Nov 18 '10 at 22:55
2

under what circumstances should I deviate from that [tuples should be of known length] rule?

None.

It's a matter of meaning. If an object has meaning based on a fixed number of elements, then it's a tuple. (x,y) coordinates, (c,m,y,k) colors, (lat, lon) position, etc., etc.

A tuple has a fixed number of elements based on the problem domain in general and the specifics of the problem at hand.

Designing a tuple with an indefinite number of elements makes little sense. When do we switch from (x,y) to (x,y,z) and then to (x,y,z,w) coordinates? Not by simply concatenating a value as if it's a list? If we're moving from 2-d to 3-d coordinates there's usually some pretty fancy math to map the coordinate systems. Not appending an element to a list.

What does it mean to move from (r,g,b) colors to something else? What is the 4th color in the rgb system? For that matter, what's the fifth color in the cmyk ststem?

Tuples do not change size.

*args is a tuple because it is immutable. Yes, it has an indefinite number of arguments, but it's a rare counter-exmaple to tuples of known, defined sizes.


What to do about an indefinite length tuple. This counter-example is so profound that we have two choices.

  1. Reject the very idea that tuples are fixed-length, and constrained by the problem,. The very idea of (x,y) coordinates and (r,g,b) colors is utterly worthless and wrong because of this counter-example. Fixed-length tuples? Never.

  2. Always convert all *args to lists to always have a fussy level of unthinking conformance to a design principle. Covert to lists? Always.

I love all or nothing choices, since they make software engineering so simplistic and unthinking.

Perhaps, in these corner cases, there's a tiny scrap of "this requires thinking" here. A tiny scrap.

Yes, *args is a tuple. Yes, it's of indefinite length. Yes, it's a counter-example where "fixed by the problem domain" is trumped by "simply immutable".

This leads us to the third choice in the case where a sequence is immutable for a different reason. You'll never mutate it, so it's okay to be a tuple of indefinite size. In the even-more-rare case where you're popping values of *args because you're treating it like a stack or a queue, then you might want to make a list out of it. But we can't pre-solve all possible problems.

Sometimes Thinking Is Required.


When you're doing design, you design a tuple for a reason. To impose a meaningful structure on your data. Fixed-length number of elements? Tuple. Variable number of elements (i.e., mutable)? List.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • Is performance hit too small to worry about? And what should I do about Python's own indefinite-length tuples (when arguments are packed into a tuple) - should I coerce it into a list right away? – max Nov 18 '10 at 06:55
  • 1
    I can think of at least one counterexample to your claim. `*args`. That's a tuple and I have no idea how long it's going to be. I don't think that you could come up with a more 'pythonic' counterexample than that. – aaronasterling Nov 18 '10 at 10:32
  • 1
    @max: "Is performance hit too small to worry about?" Yes. It's about **meaning**. Performance doesn't matter in this case. If you need more speed, find the right algorithm. If you can **prove** you have the right algorithm and you can **prove** the Python implementation is too slow, switch to C for that one thing. – S.Lott Nov 18 '10 at 10:59
  • @max, "when arguments are packed into a tuple" you **never** append or modify this. Since you **never** append or modify, you don't need to convert it to a list. There's no **meaning** to that. You just use it as a tuple. – S.Lott Nov 18 '10 at 11:00
  • 3
    What do you do if you have an object of a fixed size, but you need to modify its elements? Or if you have an object of variable size, but you need to use it as a key? – Ned Batchelder Nov 18 '10 at 12:06
  • @Ned Batchelder: Fixed-size but mutable... That's a perfect time to create new immutable objects from old immutable objects in the style of functional programming instead of OO programming. It works out really, really well. And yes, converting a list (or a dict) to a tuple to make it suitable for a dict key is an odd, but necessary hack. – S.Lott Nov 18 '10 at 12:23
  • 2
    Sounds like an over-simplification to me... ;-) – martineau Nov 18 '10 at 13:04
  • What about my example with `sumproduct`? Would you see it as bad design if it uses tuples (of unknown length, but immutable)? – max Nov 19 '10 at 11:55
  • 1
    @max: A generic "sumproduct" isn't really appropriate. In most applications you actually **do** know what size your vectors are and you actually **should** write sumproduct that fits the actual size of your actual vectors. You don't really need a generic, solves-all-possible-problems, universal `sumproduct` function. You really need a specialized `sumproduct4` for doing coordinate transformations on (x,y,z,r) vectors. – S.Lott Nov 19 '10 at 13:13
  • 1
    @S.Lott: I respectfully disagree. I use `sumproduct` to calculate, for example, the current value of a portfolio of stocks, which is the sum of `(quantity(i) * price(i))` across all stocks `i` in the portfolio. Portfolio size is variable. – max Nov 20 '10 at 10:40
  • 1
    @max: In that case, you have a list **to begin with**. Not a proper vector. Please don't confuse fixed-length vectors of known dimensionality with lists of values. Please keep them clearly separated. They have **nothing** to do with other. Except for the coincidence that they're both sequences and -- with some care -- you can make it ambiguous. It's not inherently ambiguous. Fixed-length vectors (i.e., tuples) and arbitrary-length lists solve different kinds of problems. But may have a sumproduct behavior, but there's **essentially** different. – S.Lott Nov 22 '10 at 12:29
1

In this case, you should probably consider using numpy and numpy arrays.

There is some overhead converting to and from numpy arrays, but if you are doing a bunch of calculation it will be much faster

John La Rooy
  • 295,403
  • 53
  • 369
  • 502