4

I just ran 2to3 on code that looks like this (A):

def idict(n):
    return dict(zip(range(n), range(n)))

and it generated this (B):

def idict(n):
    return dict(list(zip(list(range(n)), list(range(n)))))

both dict and zip can consume iterators, so why this translation?

B seems to be very slow too. Testing with

python -m timeit -s "import B as t" "t.idict(10)"

with the following results:

________________A______B______C___  
Python 2.7.13   2.89   3.82   2.29
Python 3.5.1    2.63   4.34   A

ie. from 2.89 usec to 4.34 (+50%) with the default translation.

Questions.. (i) is there a reason I shouldn't use the original code in Python 3? (it produces the correct result, and seems reasonable to me); (ii) is 2to3 the correct tool (we need to run on both 2 and 3 while transitioning ~150KLOC of python)

Update: I've added dict(itertools.izip(xrange(n), xrange(n))) as algorithm C in the table.

thebjorn
  • 26,297
  • 11
  • 96
  • 138
  • 1
    Possible duplicate of [Why does 2to3 change mydict.keys() to list(mydict.keys())?](https://stackoverflow.com/questions/27476079/why-does-2to3-change-mydict-keys-to-listmydict-keys) – Jasper Jun 08 '18 at 19:50
  • 1
    https://docs.python.org/2/library/2to3.html#2to3fixer-xrange – Jasper Jun 08 '18 at 19:54
  • A dict literal is even faster. – Josh Lee Jun 08 '18 at 20:07
  • @Jasper I don't think it's a duplicate. The other question is about the list ctor being added to a `dict.keys()` call in a for-loop context. The reasoning might be similar, but not the same. The other question is also purely about code and not about the tools. – thebjorn Jun 08 '18 at 20:13
  • @JoshLee it better be since one of the range calls is unneeded.. – thebjorn Jun 08 '18 at 20:17

1 Answers1

4

py2to3 doesn't see the global picture. It just creates some equivalent code, replacing the functions that now don't create lists anymore by adding a list wrapper, to make sure that:

  • one can subscript the result
  • one can iterate on the result as many times as wanted

(it also puts parentheses around print, ... but not relevant here)

So it tries to make your code run, but the performance isn't guaranteed like at all.

In your example, the list wrapper is useless, as the dict consumes the iterator.

So this tool is useful to make code work quickly, but should not be used without comparing to your original code and decide what to keep/what to change.

The tool could probably be improved to:

  • avoid wrapping when the iterator is used in a loop
  • avoid wrapping when the iterator is passed to an object which takes an iterable as input.

In your case

dict(zip(range(n), range(n)))

is perfectly fine and runs faster in python 3 than in python 2 because it avoids intermediate list creations, so leave it that way.

a python 2 equivalent of that would be slightly more complex:

dict(itertools.izip(xrange(n), xrange(n)))

My advice if you have a lot of code to translate (I've been there):

  • use python -3 switch with python 2 interpreter to expose your code and get some warnings instead of having it crash in python 3 (well, it is supposed to warn about Python 3.x incompatibilities that 2to3 cannot trivially fix, but it misses a lot of cases, well, it's better than nothing, for instance it finds the infamous has_key calls)
  • use py2to3 and compare the results with your original code, decide manually where to apply the changes
  • you can also use multi search/replace with tools like GrepWin to do what py2to3 would do, only with less risks of degrading the performance:
    • search for iteritems, replace by items
    • search for xrange, replace by range
    • track down dict.has_key calls, unicode built-in
    • I may forget some...
  • test and expose your code extensively with python 3. some things are invisible to the tool and the -3 option, like when you're using binary mode to read text files and such.
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219