4

I am reading the source code of Python's numpy library, and found the following snippets. It seems to perform element-wise operations on vectors (numpy.ndarray). For example, numpy.multiply([1,2,3],[4,5,6]) will get the result [4,10,18]

#define BASE_UNARY_LOOP(tin, tout, op) \
    UNARY_LOOP { \
        const tin in = *(tin *)ip1; \
        tout * out = (tout *)op1; \
        op; \
    }
#define UNARY_LOOP_FAST(tin, tout, op) \
    do { \
    /* condition allows compiler to optimize the generic macro */ \
    if (IS_UNARY_CONT(tin, tout)) { \
        if (args[0] == args[1]) { \
            BASE_UNARY_LOOP(tin, tout, op) \
        } \
        else { \
            BASE_UNARY_LOOP(tin, tout, op) \
        } \
    } \
    else { \
        BASE_UNARY_LOOP(tin, tout, op) \
    } \
    } \
    while (0)

It looks very weird to me, especially the comment inside UNARY_LOOP_FAST. What is going on here by using if A then X else X logic to optimize?

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
bbvan
  • 618
  • 1
  • 7
  • 17
  • Also, if things are required to be done only once then why `do{ ...}while(0);`? – sameerkn Feb 09 '17 at 12:08
  • @sameerkn do{…}while(0) is common and has a purpose, see [Why use apparently meaningless do-while and if-else statements in C/C++ macros?](https://stackoverflow.com/questions/154136/why-use-apparently-meaningless-do-while-and-if-else-statements-in-c-c-macros). – Jonas Schäfer Feb 09 '17 at 12:26
  • Maybe it's telling the compiler to consider the special case where the `tout` array is the same as `tin`, eg `np.sin(a, out=a)`. – hpaulj Feb 09 '17 at 15:23

2 Answers2

2

Without more context, you can't tell what kind of optimizations are made possible by the numpy code snippet, but it's probably similar to this simplified example:

#define LOOP(op) for (int i = 0; i < n; i++) op

void f(int *a, int *b, int n, int c) {
  if (c == 1) {
    LOOP(a[i] += b[i] * c);
  }
  else {
    LOOP(a[i] += b[i] * c);
  }
}

A modern compiler can eliminate the multiplication in the first branch. In the example above, you could have simply written LOOP(a[i] += b[i]) in the first branch, but if the if statement is part of another macro taking op as parameter, this isn't possible.

The basic idea is to force the compiler to generate code for multiple paths, some of which have preconditions that can be exploited for certain optimizations.

nwellnhof
  • 32,319
  • 7
  • 89
  • 113
0

This clip is from

https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/loops.c.src

This particular clip defines looping macros for a single argument ufunc, something like np.abs.

The comment before this clip is

/* * loop with contiguous specialization * op should be the code working on tin in and * storing the result in tout * out * combine with NPY_GCC_OPT_3 to allow autovectorization * should only be used where its worthwhile to avoid code bloat */

ufunc design allows usage like np.sin(a, out=b). Apparently it's telling the compiler to consider the special case where the tout array is the same as tin, eg np.sin(a, out=a).

Similarly the fast binary ufunc macros allows for identity among three arrays, np.add(a, b, out=c), which could implement it as c=a+b, a += b, b+=a.


These timing differences suggest that there is modest amount of optimization in the case where args[0] == args[1]

In [195]: a=np.ones((100,100))
In [197]: %%timeit b=np.ones((100,100))
     ...: np.sin(a, out=b)
1000 loops, best of 3: 343 µs per loop

In [198]: %%timeit b=np.ones((100,100))
     ...: np.sin(b, out=b)
1000 loops, best of 3: 279 µs per loop
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • is it possible that time differences result from the latter case does not need to `malloc` new space for b? – bbvan Feb 10 '17 at 02:10
  • `b` is predefined in both cases. In any case the times are suggestive, not proof. – hpaulj Feb 10 '17 at 02:14