0

In comparing math.sqrt(…) to the built-in binary operator, … ** 0.5 in a REPL using timeit.timeit(…), it looks as though the binary op has a slight edge over the math module function:

enter image description here

… although I had to crank the number of iterations up to 1,000,000 to be able to get visible differences (note that I did compensate for the module-function lookup overhead by directly importing sqrt from math in the relevant test).

In this case, I haven’t yet delved into checking against more evidently off-the-shelf optimized codepaths – e.g. calling libm functions via an extension or a cython module – as I am trying to stick to my New Year’s resolution not to prematurely optimize more than once a week; unlike other related-question-askers, I’m after integer-result correctness before performance.

This is specifically an operation in the hot loop of a currently synchronous image-processing function that uses PIL/Pillow to render an image as a monotone dot-screen halftone, which right now is about two orders of magnitude too slow; it can’t be readily cythion-ized or numpy-ized due to currently using Pillow’s vector-drawing API (but FYI, I am not afraid to go the distance with those elements of the problem; for example I already replaced calls into the Pillow ImageStat.Stat package with faster and less overhead-y bespoke functions, which doing that was a demonstrable and not at all premature act of optimization).

My question is: what are the important advantages and disadvantages of both of these methods (and/or any additional square-root methodologies I didn’t name) – both in the general sense as well as those that are specifically germane to the Python math environment? And, more importantly, are these issues eclipsed by other issues, such as the aforementioned overhead concerns like attribute-lookup, function dispatch, and the like?

I have often found that there are straightforward performance answers to each of these elements as a unit, but when Python code needs to meet performance standards, all of these sub-concerns ride the line through a gray area of misunderstanding – I therefore do especially hope to hear from anyone who has worked on a problem such as this and understands what to take seriously.


‡ The lineage of the halftone code itself stems from this excellent and time-tested SO answer.

fish2000
  • 4,289
  • 2
  • 37
  • 76
  • If it isn't easily vectorized, you should still consider something like `numpy` with `numba`. – juanpa.arrivillaga Mar 18 '19 at 16:58
  • @juanpa.arrivillaga `numba` is that `llvm` backend for `numpy.ndarray`, right? I’d still have to go back to synchronous unvectorized GIL’d Python to call the Pillow drawing API, I’d assume. If the math speedups are, like, tremendous, it’d be worth splitting the function into calculation- and draw-phases… can I wring those kind of gains out of such a function with `numba`? – fish2000 Mar 18 '19 at 17:02
  • 2
    If you are doing for-loops over `numpy.ndarray` objects, then *yes, tremendous gains*. I'd check it out for yourself and play with it. If you are worried about the difference between a function call and a operator, then you'll be pleasantly surprised by this. – juanpa.arrivillaga Mar 18 '19 at 17:04
  • @juanpa.arrivillaga I do always love an excuse to learn something new – I will give `numba` a shot (although I do recall it was somewhat of a pain to install, like it used a customized setup scheme that required a local `llvm` source build or somesuch – have the `numba` devs progressed on that front?) – fish2000 Mar 18 '19 at 17:14
  • 1
    Not quite a duplicate, but one of the linked questions may be: https://stackoverflow.com/q/327002/270986. It's important to note that `math.sqrt` is highly likely to be correctly-rounded (it usually ends up being a wrapper around a dedicated CPU sqrt instruction), while `** 0.5` is far from guaranteed to give correctly-rounded answers. Also, you may want to do timings in situations where the `sqrt` name lookup is a local one rather than an expensive global lookup. (E.g., with a hack that looks like `def do_stuff(my_data, _sqrt=sqrt)` followed by using `_sqrt` in place of `sqrt` in `do_stuff`.) – Mark Dickinson Mar 18 '19 at 18:37

0 Answers0