64

The purpose of my question is to strengthen my knowledge base with Python and get a better picture of it, which includes knowing its faults and surprises. To keep things specific, I'm only interested in the CPython interpreter.

I'm looking for something similar to what learned from my PHP landmines question where some of the answers were well known to me but a couple were borderline horrifying.

Update: Apparently one maybe two people are upset that I asked a question that's already partially answered outside of Stack Overflow. As some sort of compromise here's the URL http://www.ferg.org/projects/python_gotchas.html

Note that one or two answers here already are original from what was written on the site referenced above.

ForceBru
  • 43,482
  • 10
  • 63
  • 98
David
  • 17,673
  • 10
  • 68
  • 97
  • Not sure if there are much 'gotchas' moving from 2.5 to 2.6, if your intention is the python 2.X series in general, it may be better to change the title to 2.X. – monkut Feb 10 '09 at 00:28
  • What was wrong with the list in http://www.ferg.org/projects/python_gotchas.html ? – S.Lott Feb 10 '09 at 02:19
  • @S. Lott - Nothing wrong with it, just that I didn't know about it and no one's asked this question in SO. – David Feb 10 '09 at 04:29
  • @David, are you kidding? it's #1 on google for "python gotchas" –  Feb 10 '09 at 16:38
  • 2
    @hop The top rated answer right now for this question isn't mentioned in the ferg.org page. Maybe if Guido had written the ferg.org page and I had known about it, then I wouldn't have bothered, but no one singular person knows everything. – David Feb 10 '09 at 18:40
  • nvm, that answer appears to have dissappeared? – David Feb 10 '09 at 18:49
  • 1
    @S.Lott -What is wrong? ferg.org link is broken – Peter M. - stands for Monica Jan 16 '18 at 20:50

23 Answers23

85

Expressions in default arguments are calculated when the function is defined, not when it’s called.

Example: consider defaulting an argument to the current time:

>>>import time
>>> def report(when=time.time()):
...     print when
...
>>> report()
1210294387.19
>>> time.sleep(5)
>>> report()
1210294387.19

The when argument doesn't change. It is evaluated when you define the function. It won't change until the application is re-started.

Strategy: you won't trip over this if you default arguments to None and then do something useful when you see it:

>>> def report(when=None):
...     if when is None:
...         when = time.time()
...     print when
...
>>> report()
1210294762.29
>>> time.sleep(5)
>>> report()
1210294772.23

Exercise: to make sure you've understood: why is this happening?

>>> def spam(eggs=[]):
...     eggs.append("spam")
...     return eggs
...
>>> spam()
['spam']
>>> spam()
['spam', 'spam']
>>> spam()
['spam', 'spam', 'spam']
>>> spam()
['spam', 'spam', 'spam', 'spam']
Garth Kidd
  • 7,264
  • 5
  • 35
  • 36
  • +1 Excellent point! I actually have relied on this in a similar context, but I could easily see this catching the unwary off guard! – David Feb 10 '09 at 01:34
  • That's the most well known gotcha, but I've never been bitten by it before knowing it. – hasen Feb 10 '09 at 01:44
  • The same is true for class level variables (an easy mistake to make when first learning python) – Richard Levasseur Feb 10 '09 at 07:30
  • 3
    The Python designers made a lot of good design decisions, but this was no one of them. +1 – BlueRaja - Danny Pflughoeft May 25 '10 at 18:46
  • I give up/ Why is it happening? – Geoffrey Jul 30 '10 at 14:05
  • The default argument is created only once: when the function is defined. It gets re-used every time the function is called. In this case, the default argument is a list. So, what happens each time the function is called? – Garth Kidd Aug 02 '10 at 12:43
  • 1
    You could use tuple instead of list as default argument. In general - default values should be of unchangeable type (NoneType, int, tuple etc.) – Abgan Apr 15 '11 at 06:43
62

You should be aware of how class variables are handled in Python. Consider the following class hierarchy:

class AAA(object):
    x = 1

class BBB(AAA):
    pass

class CCC(AAA):
    pass

Now, check the output of the following code:

>>> print AAA.x, BBB.x, CCC.x
1 1 1
>>> BBB.x = 2
>>> print AAA.x, BBB.x, CCC.x
1 2 1
>>> AAA.x = 3
>>> print AAA.x, BBB.x, CCC.x
3 2 3

Surprised? You won't be if you remember that class variables are internally handled as dictionaries of a class object. For read operations, if a variable name is not found in the dictionary of current class, the parent classes are searched for it. So, the following code again, but with explanations:

# AAA: {'x': 1}, BBB: {}, CCC: {}
>>> print AAA.x, BBB.x, CCC.x
1 1 1
>>> BBB.x = 2
# AAA: {'x': 1}, BBB: {'x': 2}, CCC: {}
>>> print AAA.x, BBB.x, CCC.x
1 2 1
>>> AAA.x = 3
# AAA: {'x': 3}, BBB: {'x': 2}, CCC: {}
>>> print AAA.x, BBB.x, CCC.x
3 2 3

Same goes for handling class variables in class instances (treat this example as a continuation of the one above):

>>> a = AAA()
# a: {}, AAA: {'x': 3}
>>> print a.x, AAA.x
3 3
>>> a.x = 4
# a: {'x': 4}, AAA: {'x': 3}
>>> print a.x, AAA.x
4 3
Him
  • 5,257
  • 3
  • 26
  • 83
Dzinx
  • 55,586
  • 10
  • 60
  • 78
41

Loops and lambdas (or any closure, really): variables are bound by name

funcs = []
for x in range(5):
  funcs.append(lambda: x)

[f() for f in funcs]
# output:
# 4 4 4 4 4

A work around is either creating a separate function or passing the args by name:

funcs = []
for x in range(5):
  funcs.append(lambda x=x: x)
[f() for f in funcs]
# output:
# 0 1 2 3 4
Richard Levasseur
  • 14,562
  • 6
  • 50
  • 63
20

Dynamic binding makes typos in your variable names surprisingly hard to find. It's easy to spend half an hour fixing a trivial bug.

EDIT: an example...

for item in some_list:
    ... # lots of code
... # more code
for tiem in some_other_list:
    process(item) # oops!
Algorias
  • 3,043
  • 5
  • 22
  • 16
  • 1
    +1 Yeah that's kind of screwed me up once or twice, any chance you could provide an example in your answer though? – David Feb 09 '09 at 23:46
  • 3
    I suppose so, but this was just for illustration's sake. Actual ocurrences of this type of bug tend to be a bit more involved. – Algorias Feb 10 '09 at 21:44
  • 13
    You can use static checkers like PyLint to find these mistakes -- `tiem` would be marked as an unused variable. – Dzinx Apr 07 '11 at 09:13
18

One of the biggest surprises I ever had with Python is this one:

a = ([42],)
a[0] += [43, 44]

This works as one might expect, except for raising a TypeError after updating the first entry of the tuple! So a will be ([42, 43, 44],) after executing the += statement, but there will be an exception anyway. If you try this on the other hand

a = ([42],)
b = a[0]
b += [43, 44]

you won't get an error.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 2
    Or you could simply write: `a[0].extend([43, 44])`. – Dzinx Apr 07 '11 at 09:17
  • 2
    Wow. I consider changing and raising an exception afterwards a bug in Python. Any reason why this might be just a wart? – Alfe Sep 05 '12 at 12:24
  • 1
    WOW. I expected the error, but I didn't expect it to actually _modify_ the list too. That's ugly. However, I don't believe I've ever run into this because I make a habit of not using tuples with data that I want to change. Even if it points to the same list, I'd rather the values remain immutable. If I want to have positional elements that can change, I'd either use a list, dictionary or class. – johannestaas Feb 07 '14 at 21:58
  • 1
    It's mentioned at https://docs.python.org/2/faq/programming.html#why-does-a-tuple-i-item-raise-an-exception-when-the-addition-works – selfboot Aug 01 '16 at 02:32
16
try:
    int("z")
except IndexError, ValueError:
    pass

reason this doesn't work is because IndexError is the type of exception you're catching, and ValueError is the name of the variable you're assigning the exception to.

Correct code to catch multiple exceptions is:

try:
    int("z")
except (IndexError, ValueError):
    pass
user537122
  • 161
  • 1
  • 2
12

There was a lot of discussion on hidden language features a while back: hidden-features-of-python. Where some pitfalls were mentioned (and some of the good stuff too).

Also you might want to check out Python Warts.

But for me, integer division's a gotcha:

>>> 5/2
2

You probably wanted:

>>> 5*1.0/2
2.5

If you really want this (C-like) behaviour, you should write:

>>> 5//2
2

As that will work with floats too (and it will work when you eventually go to Python 3):

>>> 5*1.0//2
2.0

GvR explains how integer division came to work how it does on the history of Python.

Community
  • 1
  • 1
Tom Dunham
  • 5,779
  • 2
  • 30
  • 27
  • 4
    Definitely a gotcha. It's gotten so than adding "from __future__ import division" to every new .py file I create is practically a reflex. – Chris Upchurch Feb 10 '09 at 19:18
  • 1
    Makes sense supposing that 5 and 2 are actually variables. Otherwise you could just write 5./2 – Algorias Feb 10 '09 at 21:46
  • Why are you multiplying by 1.0? Wouldn't it be just as easy to make 5 be 5.0 or float(5) in case 5 is hidden in a variable. – Nope Jul 27 '09 at 17:31
  • 19
    "The correct work-around is subtle: casting an argument to float() is wrong if it could be a complex number; adding 0.0 to an argument doesn't preserve the sign of the argument if it was minus zero. The only solution without either downside is multiplying an argument (typically the first) by 1.0. This leaves the value and sign unchanged for float and complex, and turns int and long into a float with the corresponding value." (PEP 238 - http://www.python.org/dev/peps/pep-0238/) – Tom Dunham Jul 28 '09 at 16:15
  • Using float() instead of `*1.0` in the vast majority of cases in which complex numbers cannot be involved is better style IMHO because it states what you really intent. Multiplying by 1.0 to achieve this is kind of obfuscating what you want and any unaware reader of the code might be lures to think that the 1.0 might maybe just be a typo (*maybe it should 10.0 instead?*). – Alfe Sep 05 '12 at 12:15
  • @Alfe would it be less likely to be perceived as a typo if it's `1.` as opposed to `1.0`? Both work the same. – Asclepius Oct 14 '16 at 06:43
  • No, I see no difference between *1.0 and *1., but when using either in cases when you intend to upgrade anything to at least float and to not-downgrade complex, I'd propose to comment this properly. When no complex comes into play I'd prefer using float() instead. – Alfe Oct 16 '16 at 13:18
11

List slicing has caused me a lot of grief. I actually consider the following behavior a bug.

Define a list x

>>> x = [10, 20, 30, 40, 50]

Access index 2:

>>> x[2]
30

As you expect.

Slice the list from index 2 and to the end of the list:

>>> x[2:]
[30, 40, 50]

As you expect.

Access index 7:

>>> x[7]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

Again, as you expect.

However, try to slice the list from index 7 until the end of the list:

>>> x[7:]
[]

???

The remedy is to put a lot of tests when using list slicing. I wish I'd just get an error instead. Much easier to debug.

Viktiglemma
  • 912
  • 8
  • 19
  • I agree. It really hides those one-off bugs. – johannestaas Feb 07 '14 at 21:49
  • This one is actually quite predictable for Python and is useful when iterating over empty slices. – Asclepius Oct 14 '16 at 06:46
  • I don't think this is undesirable behavior. If you think about the logic behind slicing, this is something predictable and desirable... it's a bit like doing list comprehension with no list index satisfying the constraint that you put on the list to define it. That's useful! – Patrick Da Silva Apr 01 '18 at 12:32
11

Not including an __init__.py in your packages. That one still gets me sometimes.

Jason Baker
  • 192,085
  • 135
  • 376
  • 510
9

The only gotcha/surprise I've dealt with is with CPython's GIL. If for whatever reason you expect python threads in CPython to run concurrently... well they're not and this is pretty well documented by the Python crowd and even Guido himself.

A long but thorough explanation of CPython threading and some of the things going on under the hood and why true concurrency with CPython isn't possible. http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/

David
  • 17,673
  • 10
  • 68
  • 97
  • 2
    check out the new multiprocessing module available in 2.6 for thread-like handling using separate processes if the GIL is bothering you. http://docs.python.org/library/multiprocessing.html – monkut Feb 10 '09 at 00:26
  • 1
    @David - must have been pyprocessing which has been made part of the standard libraries under the guise of multiprocessing – Ravi Jul 26 '09 at 08:48
9

James Dumay eloquently reminded me of another Python gotcha:

Not all of Python's “included batteries” are wonderful.

James’ specific example was the HTTP libraries: httplib, urllib, urllib2, urlparse, mimetools, and ftplib. Some of the functionality is duplicated, and some of the functionality you'd expect is completely absent, e.g. redirect handling. Frankly, it's horrible.

If I ever have to grab something via HTTP these days, I use the urlgrabber module forked from the Yum project.

Garth Kidd
  • 7,264
  • 5
  • 35
  • 36
  • I remember a couple years back giving up trying to accomplish what I wanted with the suite of tools above and ended up using pyCurl. – David Feb 12 '09 at 03:03
  • 2
    The fact that there's a module named urllib and a module named urllib2 still gets under my skin. – Jason Baker Aug 26 '09 at 12:06
  • 1
    This is probably the real reason for Python 3 :) They got to the point of, 'wait, where's the... let's start over'. – orokusaki Jan 07 '10 at 06:39
  • Python 3 is better organized with regard to these libraries. – Asclepius Oct 14 '16 at 06:48
7

Unintentionally mixing oldstyle and newstyle classes can cause seemingly mysterious errors.

Say you have a simple class hierarchy consisting of superclass A and subclass B. When B is instantiated, A's constructor must be called first. The code below correctly does this:

class A(object):
    def __init__(self):
        self.a = 1

class B(A):
    def __init__(self):
        super(B, self).__init__()
        self.b = 1

b = B()

But if you forget to make A a newstyle class and define it like this:

class A:
    def __init__(self):
        self.a = 1

you get this traceback:

Traceback (most recent call last):
  File "AB.py", line 11, in <module>
    b = B()
  File "AB.py", line 7, in __init__
    super(B, self).__init__()
TypeError: super() argument 1 must be type, not classobj

Two other questions relating to this issue are 489269 and 770134

Community
  • 1
  • 1
Dawie Strauss
  • 3,706
  • 3
  • 23
  • 26
7
def f():
    x += 1

x = 42
f()

results in an UnboundLocalError, because local names are detected statically. A different example would be

def f():
    print x
    x = 43

x = 42
f()
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • If you use global x inside of f(), that will allow you to reference variables outside of f's scope. – David Feb 21 '11 at 18:46
  • @David: I know why you get an error. The point of the post is that most people don't *expect* to get an error here, so it's a gotcha. – Sven Marnach Feb 21 '11 at 21:20
  • I was commenting more for the sake of others, not all but most of the gotchas here have solutions. For myself, I didn't even know about the global symbol/operator until two years after I started using Python. – David Feb 22 '11 at 06:25
  • Using global variables such as that is a bad code smell, anyways. I'm glad you didn't come across it till 2 years in. Means you're not getting functions to perform on global scope variables. – Zoran Pavlovic Dec 18 '12 at 08:40
  • @ZoranPavlovic: You get the same issue when using closures instead of global variables. A counter implemented as a closure would be perfectly reasonable, but you have to be careful. – Sven Marnach Dec 18 '12 at 20:29
7

Floats are not printed at full precision by default (without repr):

x = 1.0 / 3
y = 0.333333333333
print x  #: 0.333333333333
print y  #: 0.333333333333
print x == y  #: False

repr prints too many digits:

print repr(x)  #: 0.33333333333333331
print repr(y)  #: 0.33333333333300003
print x == 0.3333333333333333  #: True
pts
  • 80,836
  • 20
  • 110
  • 183
  • This is a compromise so that the float string is reasonably portable across python's platforms, since python uses hardware floats. – u0b34a0f6ae Aug 26 '09 at 09:03
5

You cannot use locals()['x'] = whatever to change local variable values as you might expect.

This works:

>>> x = 1
>>> x
1
>>> locals()['x'] = 2
>>> x
2

BUT:

>>> def test():
...     x = 1
...     print x
...     locals()['x'] = 2
...     print x  # *** prints 1, not 2 ***
...
>>> test()
1
1

This actually burnt me in an answer here on SO, since I had tested it outside a function and got the change I wanted. Afterwards, I found it mentioned and contrasted to the case of globals() in "Dive Into Python." See example 8.12. (Though it does not note that the change via locals() will work at the top level as I show above.)

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Anon
  • 11,870
  • 3
  • 23
  • 19
  • locals() at module level is the same thing as globals() anywhere in the module, is it not? It notes that globals() will take the change. – u0b34a0f6ae Aug 26 '09 at 09:07
5

x += [...] is not the same as x = x + [...] when x is a list`

>>> x = y = [1,2,3]
>>> x = x + [4]
>>> x == y
False

>>> x = y = [1,2,3]
>>> x += [4]
>>> x == y
True

One creates a new list while the other modifies in place

David
  • 17,673
  • 10
  • 68
  • 97
mchen
  • 9,808
  • 17
  • 72
  • 125
4

List repetition with nested lists

This caught me out today and wasted an hour of my time debugging:

>>> x = [[]]*5
>>> x[0].append(0)

# Expect x equals [[0], [], [], [], []]
>>> x
[[0], [0], [0], [0], [0]]   # Oh dear

Explanation: Python list problem

Community
  • 1
  • 1
mchen
  • 9,808
  • 17
  • 72
  • 125
2

Using class variables when you want instance variables. Most of the time this doesn't cause problems, but if it's a mutable value it causes surprises.

class Foo(object):
    x = {}

But:

>>> f1 = Foo()
>>> f2 = Foo()
>>> f1.x['a'] = 'b'
>>> f2.x
{'a': 'b'}

You almost always want instance variables, which require you to assign inside __init__:

class Foo(object):
    def __init__(self):
        self.x = {}
Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192
2

Python 2 has some surprising behaviour with comparisons:

>>> print x
0
>>> print y
1
>>> x < y
False

What's going on? repr() to the rescue:

>>> print "x: %r, y: %r" % (x, y)
x: '0', y: 1
Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192
  • 1
    This is actually fixed in Python 3: `TypeError: unorderable types: str() < int()`. – Simeon Visser Oct 01 '14 at 16:39
  • I could not replicate this in Python 2.7.7 – hello_there_andy Dec 09 '14 at 18:18
  • Isn't rather the result of `print` what comes as a surprise, than what happens in the comparison? `x + y` would also lead to something rather unexpected here as all kinds of operations between `x` and `y`. – jolvi Mar 19 '15 at 15:16
  • 1
    @hello_there_andy The issue absolutely does exist in Python 2.7.12, although it raises `TypeError` in Python 3.5.2. Just try `'0' < 1`. – Asclepius Oct 14 '16 at 06:55
1

If you assign to a variable inside a function, Python assumes that the variable is defined inside that function:

>>> x = 1
>>> def increase_x():
...     x += 1
... 
>>> increase_x()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in increase_x
UnboundLocalError: local variable 'x' referenced before assignment

Use global x (or nonlocal x in Python 3) to declare you want to set a variable defined outside your function.

Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192
0

The values of range(end_val) are not only strictly smaller than end_val, but strictly smaller than int(end_val). For a float argument to range, this might be an unexpected result:

from future.builtins import range
list(range(2.89))
[0, 1]
jolvi
  • 4,463
  • 2
  • 15
  • 13
  • I cannot replicate. `range(2.0)` causes `TypeError: 'float' object cannot be interpreted as an integer` in both Python 2.7.12 and 3.5.2. – Asclepius Oct 14 '16 at 06:59
  • Thanks, I updated the example. It "works" only if the `future` module's `range` is used. – jolvi Oct 22 '16 at 21:49
  • `from future.builtins import range` produces `ImportError: No module named future.builtins` on Python 2.7 and 3.5. Nobody that I know does anything like that anyway. I recommend deletion of the answer. Perhaps it was true fifteen years ago, but that shouldn't matter now. – Asclepius Oct 23 '16 at 01:11
  • You need to have installed the `future` package, obviously. The package deals with transitions and compatibility between Python 2 and 3, hence not quite outdated since fifteen years, see http://python-future.org. The package allows to write Python 3 code which remains Python 2 compatible. Without any other constraints, this is what everybody should do at this point in time, 2016, IMHO. – jolvi Oct 25 '16 at 22:37
  • You're going in circles considering that I already noted that the answer cannot be replicated with `range` on Python 3.5.2 either. The question is about Python 2.x gotchas, and basically your answer is not. – Asclepius Oct 26 '16 at 02:41
  • I am lost. First, how is it relevant what you find in Python 3 when, as you say in the next sentence, the question was for Python 2? Second, it is utterly useful to write Python 2 code which is as compatible as possible with Python 3. This is what the `future` package is for. So, what you see in the answer *is* Python 2 code. I agree that this gotcha will never be very frequently observed, but I *did* stumble over it in real life while developing "real" code. – jolvi Oct 27 '16 at 13:58
  • The problem here is apparently with a broken `range` function in the third-party package that you are using. It has nothing to do with the function with the same name in Python 2 or 3. That's why it's not a Python gotcha. I can come up with any number of broken functions in third party packages, but that's not what the question is about. – Asclepius Oct 27 '16 at 17:00
0

Due to 'truthiness' this makes sense:

>>>bool(1)
True

but you might not expect it to go the other way:

>>>float(True)
1.0

This can be a gotcha if you're converting strings to numeric and your data has True/False values.

Bryan S
  • 124
  • 6
-1

If you create a list of list this way:

arr = [[2]] * 5 
print arr 
[[2], [2], [2], [2], [2]]

Then this creates an array with all elements pointing to the same object ! This might create a real confusion. Consider this:

arr[0][0] = 5

then if you print arr

print arr
[[5], [5], [5], [5], [5]]

The proper way of initializing the array is for example with a list comprehension:

arr = [[2] for _ in range(5)]

arr[0][0] = 5

print arr

[[5], [2], [2], [2], [2]]