1

I ran into this question about converting ints to floats, and wondered which of the suggested solutions would be faster (and I also was converting to ints, not from them). The code used was this, in ipython:

In [1]: import numpy as np   
In [2]: a1 = np.random.rand(100000)*31  
In [3]: %%timeit
   ...: a2 = [int(a) for a in a1]
   12.3 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [4]: %%timeit
    ...: a2 = list(map(int, a1))
    9.58 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So I wonder what's the explanation for this? Is this supposed to work like this? What should I read to get to know more about it?

ETA: this great answer covers the "should this be", but not the "why"

Chiffa
  • 1,486
  • 2
  • 19
  • 38
  • Because they are implemented slightly differently. – juanpa.arrivillaga Nov 19 '22 at 01:44
  • That's not a huge surprise, is it? `map` is specifically designed for this operation. `map` has been part of Python from the very beginning. It's implementation is in C. List comprehensions are a (relatively) recent addition, and result in IL instructions that need to be interpreted. However, they are vastly more flexible, and thus usually preferred. This is not likely to be a bottleneck in your code. – Tim Roberts Nov 19 '22 at 01:44
  • Anyway, in this *particular* case, `map` is faster because it only resolves the global name `int` **once**, whereas the list comprehension version looks it up *on each iteration*. Global lookups are relatively expensive – juanpa.arrivillaga Nov 19 '22 at 01:45
  • 1
    @TimRoberts `map` is *not* faster, in general. `map` forces you to call a function on each iteration, if your mapping operation is always just a function call, then it is faster (since now it comes down to how that function is resolved). However, consider `[(x + 3) / (x + 4) for x in range(1_000)]` vs `list(map(lambda x: (x + 3) / (x + 4), range(1_000)))` and the list comprehension wins. Most of these difference s will be marginal – juanpa.arrivillaga Nov 19 '22 at 01:48
  • @SyedRafay but that isn't really a reasonable thing to do. – juanpa.arrivillaga Nov 19 '22 at 01:48
  • yeah sorry my bad, that comparision isn't valid @juanpa.arrivillaga – Syed Rafay Nov 19 '22 at 01:50
  • 1
    @SyedRafay well, also, that standard deviation is due to *random noise*, probably – juanpa.arrivillaga Nov 19 '22 at 01:51
  • @TimRoberts, list comprehensions date back to v2.0, 2000, https://peps.python.org/pep-0202/.. But that history shouldn't matter with the 3.0 rewrite. I read that at one point Guido wanted to omit `map` from py3. – hpaulj Nov 19 '22 at 02:33
  • 2
    `[int(a) for a in a1.tolist()]` and `list(map(int, a1.tolist()))` is faster in both cases and has a greater impact on the performance than the different methods. `a1.astype(int)` is ~150x faster. – Michael Szczesny Nov 19 '22 at 03:33
  • @Michael, it does, do you know the reasons. – ILS Nov 19 '22 at 05:36
  • 2
    **Don't do that**: both are very inefficient. Consider using vectorized Numpy operation rather than comparing overheads of CPython interpreter. For example: `a1.astype(np.int64)` is 300 times faster on my machine. If you really want a list, then you can call `tolist()` which is still 16 times faster. Consider reading [this post](https://stackoverflow.com/questions/69584027) to understand why iterating over Numpy arrays that way is slow. – Jérôme Richard Nov 19 '22 at 11:57

0 Answers0