2

Was going through uses of map in the code to replace it with imap (for style mainly) - so I run a performance test on my main use cases:

import random
import string
import timeit

repeat = 3
numbers = 1000

def time(statement, _setup=None):
    print min(
        timeit.Timer(statement, setup=_setup or setup).repeat(repeat, numbers))

random.seed('slartibartfast')
length = 10
li = [''.join(random.choice(string.ascii_uppercase) for __ in range(16)) for _
      in range(length)]
print len(li)

setup = """from itertools import imap
from __main__ import li
"""

print 'join:'
time("','.join(map(str.lower, li))")
time("','.join(imap(str.lower, li))")
print 'tuple:'
time("tuple(map(str.lower, li))")
time("tuple(imap(str.lower, li))")
print 'for:'
time('for _ in (map(str.lower, li)): pass')
time('for _ in (imap(str.lower, li)): pass')

The results for big lists were ok:

10000
join:
4.34010698679
3.95416465252
tuple:
3.57020920625
3.93349155589
for:
3.17242116829
2.66427889191

but the result in the tuple call surprised me - so I run a test with smaller lists that surprised me even more:

10
join:
0.00541496942211
0.0062448383055
tuple:
0.00517576996075
0.00560386370428
for:
0.00521174982402
0.00442818835725

for still remains faster (that would really be a surprise) but the join and tuple calls are slower when using imap. The only possible reason I can think of is that they exhaust the iterator - am I right ? but how is then the call faster for join for big lists ?

DISCLAIMER: this is not a performance question, it's about how things are implemented.

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
  • 1
    `join` must create a list from the iterable anyway because it has to know the size of the string to build beforehand. – user2390182 Jan 23 '17 at 11:56
  • @schwobaseggl : Thanks - that's what I suspected - but why is then faster for big lists ? What about the tuple case - does it indeed exhaust the iterator there too ? – Mr_and_Mrs_D Jan 23 '17 at 12:00
  • Without knowing the implementational details, I suspect, tuple and join are similar in that they both create immutable objects (tuple and string) which - for the sake of their in-memory footprint - might both first have to exhaust the generator to know the size of the array to be used under the hood. Btw, have a rec for the "hitchhiker's guide" reference ;) – user2390182 Jan 23 '17 at 12:08
  • Thanks @schwobaseggl - the answer should lie someplace inside [this PySequence_Tuple call](https://hg.python.org/cpython/file/tip/Objects/tupleobject.c#l661) - but where the heck is that PySequence_Tuple ? `set` with imap is consistently faster btw – Mr_and_Mrs_D Jan 23 '17 at 12:36

0 Answers0