2

I tried to find the performance difference between slice assignment and regular assignment for lists. Here is the code:

import time

N =  1000  
a = list(range(N))
b = list(range(N))

time1 = time.time()
for i in range(N):
    a = [x for x in a if x is not i]
time2 = time.time()
for i in range(N):
    b[:] = [x for x in b if x is not i]
time3 = time.time()

print a
print b    
print time2 - time1
print time3 - time2

My expectation is that, for each list a and b, this will remove one element at a time, so that print a and print b both print empty lists. Instead, they seem to always print the starting lists, but with the first 256 elements missing.

They both print:

[257, 258, 259 ... N-1]

What is happening?

I'm using Python 2.7.6.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
oyvind
  • 1,429
  • 3
  • 14
  • 24

1 Answers1

6

The problem is that you're using is instead of ==.

The former checks for object identity, not equality. There's no reason to believe that evaluating, say, 300+1 twice will give you the same int object, just that they'll both give you int objects whose value is 301.

This happens to "work" for numbers up to 256 because your particular Python implementation* happens to intern integers up to 256. At startup, it creates a singleton object for the number 1, a singleton object for 2, and so on. Any time an expression evaluates to the number 1, it gives you that object, instead of a new one.**

Needless to say, you should not rely on that optimization.


* IIRC, every version of CPython from the 1.x days to 3.5 defaults to this behavior for all integers from -5 to 256, but you can change those limits, or turn off the feature, at build time, and a different implementation might do something different.

** If you're wondering how this works in CPython, at the C API level, PyLong_FromLong does this by looking up numbers from -5 to 256 in an array of singleton values. You can see the 3.4 version of the code, for example, here; the macro CHECK_SMALL_INT and the actual function get_small_int that it calls, and the static array that function uses, are all are in the same file, up near the top.

abarnert
  • 354,177
  • 51
  • 601
  • 671