56

Is it possible to map a NumPy array in place? If yes, how?

Given a_values - 2D array - this is the bit of code that does the trick for me at the moment:

for row in range(len(a_values)):
    for col in range(len(a_values[0])):
        a_values[row][col] = dim(a_values[row][col])

But it's so ugly that I suspect that somewhere within NumPy there must be a function that does the same with something looking like:

a_values.map_in_place(dim)

but if something like the above exists, I've been unable to find it.

mac
  • 42,153
  • 26
  • 121
  • 131
  • 5
    you could do `a_values = np.vectorize(dim)(a_values)` and avoid the nested loops but that's still not in place, so it's not the answer. – Dan D. Jul 26 '11 at 01:02
  • 7
    I don't know of a function that will do this, but if there is one, it only makes the code look neater. If you want the performance speedup that is characteristic of Numpy, then you need to re-write the dim() function to work on numpy arrays directly. – Bob Jul 26 '11 at 01:07
  • @eryksun yes it would but that's still not in place operation and so its not much better and might incur an additional copy over what i stated – Dan D. Jul 26 '11 at 01:58
  • I made what I believe was a gallant attempt at abusing `vectorize` but I'm giving up now. Seconding Bob. – senderle Jul 26 '11 at 02:33
  • @senderle - For whatever it's worth, your gallant attempt seems to work perfectly for me... (And is pretty slick, all things considered) Out of vague curiosity, where was it going wrong? – Joe Kington Jul 26 '11 at 05:07
  • @Joe Kington, two problems: First, it turns out every time a `vectorize`d function is called, the function it wraps is called _twice_ on the first value in the array, as a way of determining the type of the return value! So using `vectorize` to do in-place work leads to the double-application of the function on the first value. So you have to skirt that behavior. And second, the payoff for such ugliness is nil; the result is no faster than nested for loops (by my tests, anyway). – senderle Jul 26 '11 at 05:11
  • @senderle - Ah, right. Now that I think about it, I only tested it on things that started with `0`! I think if you specify the output dtype (the `otype` kwarg) it won't be called twice, though (untested...). Of course, this doesn't get around the speed problem. There's a fair bit of overhead in a python function call, so I don't think going to cython or straight C for the rest wouldn't improve things, either. – Joe Kington Jul 26 '11 at 05:16
  • @Joe, I thought the same thing about `otype` but I couldn't get it to work. It could be that I was just doing it wrong though. – senderle Jul 26 '11 at 14:38
  • Despite of performance, resize can reshape the array in place – Samuel Oct 24 '13 at 08:42
  • Can't quite figure why ndarray doesn't provide an `apply` method ... – matanster May 12 '20 at 10:52

5 Answers5

55

It's only worth trying to do this in-place if you are under significant space constraints. If that's the case, it is possible to speed up your code a little bit by iterating over a flattened view of the array. Since reshape returns a new view when possible, the data itself isn't copied (unless the original has unusual structure).

I don't know of a better way to achieve bona fide in-place application of an arbitrary Python function.

>>> def flat_for(a, f):
...     a = a.reshape(-1)
...     for i, v in enumerate(a):
...         a[i] = f(v)
... 
>>> a = numpy.arange(25).reshape(5, 5)
>>> flat_for(a, lambda x: x + 5)
>>> a

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

Some timings:

>>> a = numpy.arange(2500).reshape(50, 50)
>>> f = lambda x: x + 5
>>> %timeit flat_for(a, f)
1000 loops, best of 3: 1.86 ms per loop

It's about twice as fast as the nested loop version:

>>> a = numpy.arange(2500).reshape(50, 50)
>>> def nested_for(a, f):
...     for i in range(len(a)):
...         for j in range(len(a[0])):
...             a[i][j] = f(a[i][j])
... 
>>> %timeit nested_for(a, f)
100 loops, best of 3: 3.79 ms per loop

Of course vectorize is still faster, so if you can make a copy, use that:

>>> a = numpy.arange(2500).reshape(50, 50)
>>> g = numpy.vectorize(lambda x: x + 5)
>>> %timeit g(a)
1000 loops, best of 3: 584 us per loop

And if you can rewrite dim using built-in ufuncs, then please, please, don't vectorize:

>>> a = numpy.arange(2500).reshape(50, 50)
>>> %timeit a + 5
100000 loops, best of 3: 4.66 us per loop

numpy does operations like += in place, just as you might expect -- so you can get the speed of a ufunc with in-place application at no cost. Sometimes it's even faster! See here for an example.


By the way, my original answer to this question, which can be viewed in its edit history, is ridiculous, and involved vectorizing over indices into a. Not only did it have to do some funky stuff to bypass vectorize's type-detection mechanism, it turned out to be just as slow as the nested loop version. So much for cleverness!

Community
  • 1
  • 1
senderle
  • 145,869
  • 36
  • 209
  • 233
  • Thank you for this (+1) - I will test it when I start working later today. As for _why_ I need this.... arrays are used internally by [`pygame.surfarray.pixels2d`](http://pygame.org/docs/ref/surfarray.html). The array is a reference to the image pixel values, not a copy of it, so I need to change the array I got from pygame if I want my image/sprite to be modify in the scene. That said, this is my very first time with pynum, so if I missed something, you are welcome to rectify my understanding! :) – mac Jul 26 '11 at 07:44
  • 1
    @mac, if that's the case, then I would recommend eryksun's solution, as posted in the comments: `a_values[:] = np.vectorize(dim)(a_values)`. It creates a copy, but slice assignment (`a_values[:]`) alters the array in-place. Let me know if that doesn't work. – senderle Jul 26 '11 at 13:57
  • @mac, also, if you plan to reuse the vectorized version of `dim`, it's probably wise to give it its own name, so that you aren't calling `vectorize` all the time. – senderle Jul 26 '11 at 14:02
  • 1
    @mac, and finally, if you _can_ rewrite dim using ufuncs, then slice assignment + ufunc_dim (`a_values[:] = ufunc_dim(a_values)`) will be the best solution, hands down. – senderle Jul 26 '11 at 14:04
  • 1
    @eryksun, based on mac's new comments, your solution is the best one. An answer from you would have my upvote. – senderle Jul 26 '11 at 14:06
  • "Hands down" is an understatement!!! (see my own answer). On another note: it seems that slice-assignment is the most relevant improvement for non-numpy alternatives, if one really want to go that way. – mac Jul 26 '11 at 17:14
  • @mac, just fyi, I have since learned that in-place ufunc operations (like `+=`, `*=`) in `numpy` are performed, well, in-place. – senderle Apr 19 '12 at 21:20
46

This is a write-up of contributions scattered in answers and comments, that I wrote after accepting the answer to the question. Upvotes are always welcome, but if you upvote this answer, please don't miss to upvote also those of senderle and (if (s)he writes one) eryksun, who suggested the methods below.

Q: Is it possible to map a numpy array in place?
A: Yes but not with a single array method. You have to write your own code.

Below a script that compares the various implementations discussed in the thread:

import timeit
from numpy import array, arange, vectorize, rint

# SETUP
get_array = lambda side : arange(side**2).reshape(side, side) * 30
dim = lambda x : int(round(x * 0.67328))

# TIMER
def best(fname, reps, side):
    global a
    a = get_array(side)
        t = timeit.Timer('%s(a)' % fname,
                     setup='from __main__ import %s, a' % fname)
    return min(t.repeat(reps, 3))  #low num as in place --> converge to 1

# FUNCTIONS
def mac(array_):
    for row in range(len(array_)):
        for col in range(len(array_[0])):
            array_[row][col] = dim(array_[row][col])

def mac_two(array_):
    li = range(len(array_[0]))
    for row in range(len(array_)):
        for col in li:
            array_[row][col] = int(round(array_[row][col] * 0.67328))

def mac_three(array_):
    for i, row in enumerate(array_):
        array_[i][:] = [int(round(v * 0.67328)) for v in row]


def senderle(array_):
    array_ = array_.reshape(-1)
    for i, v in enumerate(array_):
        array_[i] = dim(v)

def eryksun(array_):
    array_[:] = vectorize(dim)(array_)

def ufunc_ed(array_):
    multiplied = array_ * 0.67328
    array_[:] = rint(multiplied)

# MAIN
r = []
for fname in ('mac', 'mac_two', 'mac_three', 'senderle', 'eryksun', 'ufunc_ed'):
    print('\nTesting `%s`...' % fname)
    r.append(best(fname, reps=50, side=50))
    # The following is for visually checking the functions returns same results
    tmp = get_array(3)
    eval('%s(tmp)' % fname)
    print tmp
tmp = min(r)/100
print('\n===== ...AND THE WINNER IS... =========================')
print('  mac (as in question)       :  %.4fms [%.0f%%]') % (r[0]*1000,r[0]/tmp)
print('  mac (optimised)            :  %.4fms [%.0f%%]') % (r[1]*1000,r[1]/tmp)
print('  mac (slice-assignment)     :  %.4fms [%.0f%%]') % (r[2]*1000,r[2]/tmp)
print('  senderle                   :  %.4fms [%.0f%%]') % (r[3]*1000,r[3]/tmp)
print('  eryksun                    :  %.4fms [%.0f%%]') % (r[4]*1000,r[4]/tmp)
print('  slice-assignment w/ ufunc  :  %.4fms [%.0f%%]') % (r[5]*1000,r[5]/tmp)
print('=======================================================\n')

The output of the above script - at least in my system - is:

  mac (as in question)       :  88.7411ms [74591%]
  mac (optimised)            :  86.4639ms [72677%]
  mac (slice-assignment)     :  79.8671ms [67132%]
  senderle                   :  85.4590ms [71832%]
  eryksun                    :  13.8662ms [11655%]
  slice-assignment w/ ufunc  :  0.1190ms [100%]

As you can observe, using numpy's ufunc increases speed of more than 2 and almost 3 orders of magnitude compared with the second best and worst alternatives respectively.

If using ufunc is not an option, here's a comparison of the other alternatives only:

  mac (as in question)       :  91.5761ms [672%]
  mac (optimised)            :  88.9449ms [653%]
  mac (slice-assignment)     :  80.1032ms [588%]
  senderle                   :  86.3919ms [634%]
  eryksun                    :  13.6259ms [100%]

HTH!

mac
  • 42,153
  • 26
  • 121
  • 131
  • 4
    This is one of the best self-answers I've seen, and deserves an upvote :). – senderle Jul 26 '11 at 17:37
  • Also, the chunked slice-assignment trick is interesting (in `mac_three`), and I find myself wondering if you could achieve a persuasive compromise between space- and time- efficiency using a ufunc in place of a list comprehension -- by processing, say, 10% of the array per iteration, or something like that. – senderle Jul 26 '11 at 17:54
  • @senderle - Thank you for both the appreciation and the input for solving the question! ;) In my application I use this function only to generate the about 100 10x10 pixels sprites at initialisation time, so I'm not really in pursuit of hyper-optimisation... My original question was truly just inspired by the wish to make my code neater / learning something new, but I posted the source of the test precisely to allow others to keep on playing with this, if so they wish! :) – mac Jul 26 '11 at 22:50
  • I know this is old, but three comments. 1. I would make `dim` etc. local in all of the cases to reduce overhead and better show the proportions of the differences between the cases. 2. It's possible Senderle's can be micro-optimized by using `array_set = array_.__setitem__; any(array_set(i, dim(x)) for i, x in enumerate(array_))`. 3. I'm not sure eryksun's version is truly in-place. Has this been traced? In some cases the right hand item in slice assignment is fully evaluated to speed up the actual assignment, so a copy is transiently created. – agf Nov 10 '12 at 16:12
  • I am trying to use this as an exercise to learn the arcana of Python. However, with Python 3.3 I get the following error: Traceback (most recent call last): File "C:\Users\Adriano\Google Drive\python\test.py", line 56, in print(' mac (as in question) : %.4fms [%.0f%%]') % (r[0]*1000,r[0]/tmp) TypeError: unsupported operand type(s) for %: 'NoneType' and 'tuple' – aag Jul 19 '14 at 19:31
  • @aag - The syntax for string formatting has changed in py3. Documentation is [here](https://docs.python.org/3.1/library/string.html#format-specification-mini-language). – mac Jul 19 '14 at 21:31
  • I know that this question is quite old ... but I wonder why [numpy.frompyfunc](http://docs.scipy.org/doc/numpy/reference/generated/numpy.frompyfunc.html) wasn't mentioned, because this provides a ufunc with an `out` argument, which in turn allows in-place operations. I tried this: `numpy.frompyfunc(dim, 1, 1)(array_, out=array_)` and this is slightly faster than the eryksun solution. – Matthias Jan 18 '15 at 22:04
3

Why not using numpy implementation, and the out_ trick ?

from numpy import array, arange, vectorize, rint, multiply, round as np_round 

def fmilo(array_):
    np_round(multiply(array_ ,0.67328, array_), out=array_)

got:

===== ...AND THE WINNER IS... =========================
  mac (as in question)       :  80.8470ms [130422%]
  mac (optimised)            :  80.2400ms [129443%]
  mac (slice-assignment)     :  75.5181ms [121825%]
  senderle                   :  78.9380ms [127342%]
  eryksun                    :  11.0800ms [17874%]
  slice-assignment w/ ufunc  :  0.0899ms [145%]
  fmilo                      :  0.0620ms [100%]
=======================================================
fabrizioM
  • 46,639
  • 15
  • 102
  • 119
  • It looks nice... but it doesn't work! :( The results you are getting out of this function (`[[ 0 20 40][ 60 80 100][121 141 161]]`) are for some reason - inconsistent with the those of the other tested ones (`[[ 0 20 40][ 61 81 101][121 141 162]]`). If you can fix this, I'll be happy to include your solution in my answer + upvote yours! :) – mac Jul 29 '11 at 00:39
  • 3
    @mac, @fabrizioM, I think I see what's happening. When you pass an output array to a numpy ufunc via `out`, it automatically [casts the result](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#output-type-determination) to the type of the output array. So in this case, the floating point result is cast to an int (and thus truncated) before being stored. So `fmilo` is functionally equivalent to `array_ *= 0.67328`. To get the desired rounding behavior, you have to do something like `rint((array_ * 0.67328), array_)`. But on my machine that's actually slower than slice assignment. – senderle Jul 29 '11 at 16:02
2

This is just an updated version of mac's write-up, actualized for Python 3.x, and with numba and numpy.frompyfunc added.

numpy.frompyfunc takes an abitrary python function and returns a function, which when cast on a numpy.array, applies the function elementwise.
However, it changes the datatype of the array to object, so it is not in place, and future calculations on this array will be slower.
To avoid this drawback, in the test numpy.ndarray.astype will be called, returning the datatype to int.

As side note:
Numba isn't included in Python's basic libraries and has to be downloaded externally if you want to test it. In this test, it actually does nothing, and if it would have been called with @jit(nopython=True), it would have given an error message saying that it can't optimize anything there. Since, however, numba can often speed-up code written in a functional style, it is included for integrity.

import timeit
from numpy import array, arange, vectorize, rint, frompyfunc
from numba import autojit

# SETUP
get_array = lambda side : arange(side**2).reshape(side, side) * 30
dim = lambda x : int(round(x * 0.67328))

# TIMER
def best(fname, reps, side):
    global a
    a = get_array(side)
    t = timeit.Timer('%s(a)' % fname,
                     setup='from __main__ import %s, a' % fname)
    return min(t.repeat(reps, 3))  #low num as in place --> converge to 1

# FUNCTIONS
def mac(array_):
    for row in range(len(array_)):
        for col in range(len(array_[0])):
            array_[row][col] = dim(array_[row][col])

def mac_two(array_):
    li = range(len(array_[0]))
    for row in range(len(array_)):
        for col in li:
            array_[row][col] = int(round(array_[row][col] * 0.67328))

def mac_three(array_):
    for i, row in enumerate(array_):
        array_[i][:] = [int(round(v * 0.67328)) for v in row]


def senderle(array_):
    array_ = array_.reshape(-1)
    for i, v in enumerate(array_):
        array_[i] = dim(v)

def eryksun(array_):
    array_[:] = vectorize(dim)(array_)

@autojit
def numba(array_):
    for row in range(len(array_)):
        for col in range(len(array_[0])):
            array_[row][col] = dim(array_[row][col])


def ufunc_ed(array_):
    multiplied = array_ * 0.67328
    array_[:] = rint(multiplied)

def ufunc_frompyfunc(array_):
    udim = frompyfunc(dim,1,1)
    array_ = udim(array_)
    array_.astype("int")

# MAIN
r = []
totest = ('mac', 'mac_two', 'mac_three', 'senderle', 'eryksun', 'numba','ufunc_ed','ufunc_frompyfunc')
for fname in totest:
    print('\nTesting `%s`...' % fname)
    r.append(best(fname, reps=50, side=50))
    # The following is for visually checking the functions returns same results
    tmp = get_array(3)
    eval('%s(tmp)' % fname)
    print (tmp)
tmp = min(r)/100
results = list(zip(totest,r))
results.sort(key=lambda x: x[1])

print('\n===== ...AND THE WINNER IS... =========================')
for name,time in results:
    Out = '{:<34}: {:8.4f}ms [{:5.0f}%]'.format(name,time*1000,time/tmp)
    print(Out)
print('=======================================================\n')



And finally, the results:

===== ...AND THE WINNER IS... =========================
ufunc_ed                          :   0.3205ms [  100%]
ufunc_frompyfunc                  :   3.8280ms [ 1194%]
eryksun                           :   3.8989ms [ 1217%]
mac_three                         :  21.4538ms [ 6694%]
senderle                          :  22.6421ms [ 7065%]
mac_two                           :  24.6230ms [ 7683%]
mac                               :  26.1463ms [ 8158%]
numba                             :  27.5041ms [ 8582%]
=======================================================
Sanitiy
  • 308
  • 3
  • 8
2

if ufuncs are not possible, you should maybe consider using cython. it is easy to integrate and give big speedups on specific use of numpy arrays.

LBarret
  • 1,113
  • 10
  • 23
  • True (+1). If you provide a snippet to achieve this, I'll be glad to integrate it in my answer with testing etc... – mac Jul 27 '11 at 10:05