12

Let's say I have this two snippet of code in python :

1 --------------------------
import numpy as np
x = np.array([1,2,3,4])
y = x
x = x + np.array([1,1,1,1])
print y

2 --------------------------
import numpy as np
x = np.array([1,2,3,4])
y = x
x += np.array([1,1,1,1])
print y

I thought the result of y will be the same in both examples since y point out to x and x become (2,3,4,5), BUT it wasn't

The results were (1,2,3,4) for 1 and (2,3,4,5) for 2.

After some research I find out that in first example

#-First example---------------------------------------
x = np.array([1,2,3,4]) # create x --> [1,2,3,4] 
y = x                   # made y point to x
# unril now we have x --> [1,2,3,4]
#                          |
#                          y
x = x + np.array([1,1,1,1]) 
# however this operation **create a new array** [2,3,4,5] 
# and made x point to it instead of the first one
# so we have y --> [1,2,3,4] and x --> [2,3,4,5]

#-Second example--------------------------------------
x = np.array([1,2,3,4]) # create x --> [1,2,3,4] 
y = x                   # made y point to x
# unril now the same x --> [1,2,3,4]
#                            |
#                            y
x += np.array([1,1,1,1])
# this operation **Modify the existing array**
# so the result will be
# unril now the same x --> [2,3,4,5]
#                            |
#                            y

You can find out more about this behaviors (not only for this example) in this link In-place algorithm

My question is : Being aware of this behavior why should I use in-place algorithm in term of performance? (time of excution faster? less memory alocation?..)

EDIT : Clarification

The example of (+, =+) was just to explain simply the in-place algorithm to the one who don't know.. but the question was in general the use of in-place algorithm not only in this case..

As another more complex example: loading a CSV file (just 10 Million rows) in a variable then sorting the result, is the idea of in-place algorithm is to produce an output in the same memory space that contains the input by successively transforming that data until the output is produced? - This avoids the need to use twice the storage - one area for the input and an equal-sized area for the output ( Using the minimum amount of RAM, hard disk ... )

AnouarZ
  • 1,087
  • 8
  • 23
  • I think you should use the operator that semantically makes more sense as that will perform better in terms of development time spent to make the code do what you ask (and keep the code doing what you ask). If you profile and find that you have a bottleneck somewhere, then you can optimize by doing _experiments_ with _real data_ and at that point you can definitively say which way is faster (or if there is no difference at all). My guess is that in most cases, `numpy` will have a free block of memory so there won't be a huge performance difference, but I _could_ be wrong. – mgilson Nov 15 '17 at 12:37
  • If you care about this sort of micro-optimization, don't use Python. If the job is so complex/large that a minute difference matters it's probably worth re-coding it in Java or even C. – Jared Smith Nov 15 '17 at 12:42
  • 3
    I don't understand the question. Since there is a clear semantic difference between `+` and `+=` for numpy, what does performance matter? – tobias_k Nov 15 '17 at 12:44
  • Why the downvotes guys? They are different operands... – alexisdevarennes Nov 15 '17 at 12:47
  • Its not about what gains this might give OP. This is an interesting question in general! – alexisdevarennes Nov 15 '17 at 12:47
  • *Because* `+=` is intended to modify the underlying object, it can be made more efficient (by only modifying parts of the original object, rather than modifying a copy). – chepner Nov 15 '17 at 12:48
  • But he is askign about performance, knowing they do different things! – alexisdevarennes Nov 15 '17 at 12:49
  • @tobias_k the performance come when I want to modify a variable, The idea is to produce an output in the same memory space that contains the input by successively transforming that data until the output is produced. This avoids the need to use twice the storage - one area for the input and an equal-sized area for the output. – AnouarZ Nov 15 '17 at 12:53
  • Well, `x` and `y` point to the same array, so if you _modify_ `x`, you also modify `y`. To me, this question seems to be a mix of "difference of `+` and `+=` on lists" and "making a copy of a list". – tobias_k Nov 15 '17 at 13:04
  • @tobias_k that's not true when you don't use `+=` if you modify `x` by `x = x + som` the programe alocate a new place in the memory and store the result there and make `x` point to the new result while `y` still pointing to the old one – AnouarZ Nov 15 '17 at 13:08
  • @AnouarZ Yes, exactly as with `+` vs. `+=` for lists. The point is, with `+`, you _do not_ modify `x`; you create _a new_ array and assign that to `x`. – tobias_k Nov 15 '17 at 13:22

1 Answers1

6

x = x + 1 vs x += 1

Performance

It seems that you understand the semantical difference between x += 1 and x = x + 1.

For benchmarking, you can use timeit in IPython.

After defining those functions:

import numpy as np
def in_place(n):
    x = np.arange(n)
    x += 1

def not_in_place(n):
    x = np.arange(n)
    x = x + 1

def in_place_no_broadcast(n):
    x = np.arange(n)
    x += np.ones(n, dtype=np.int)

You can simply use the %timeit syntax to compare performances:

%timeit in_place(10**7)
20.3 ms ± 81.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit not_in_place(10**7)
30.4 ms ± 253 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit in_place_no_broadcast(10**7)
35.4 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

not_in_place is 50% slower than in_place.

Note that broadcasting also makes a huge difference : numpy understands x += 1 as adding a 1 to every single element of x, without having to create yet another array.

Warning

in_place should be the preferred function: it's faster and uses less memory. You might run into bugs if you use and mutate this object at different places in your code, though. The typical example would be :

x = np.arange(5)
y = [x, x]
y[0][0] = 10
y
# [array([10,  1,  2,  3,  4]), array([10,  1,  2,  3,  4])]

Sorting

Your understanding of the advantages of in-place sorting is correct. It can make a huge difference in memory requirements when sorting large data sets.

There are other desirable features for a sorting algorithm (stable, acceptable worst-case complexity, ...) and it looks like the standard Python algorithm (Timsort) has many of them.

Timsort is an hybrid algorithm. Some parts of it are in-place and some require extra memory. It will never use more than n/2 though.

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • yes I got it, while `y = [x, x]` you didn't create a new variable you just made `y` point to the `x`.. so this `y[0][0] = 10` will change the result of `x` to become `[10,1,2,3,4]`... – AnouarZ Nov 15 '17 at 13:22
  • @AnouarZ: Exactly. This behaviour can be more subtly hidden though, e.g. with this [bug/feature](https://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument). – Eric Duminil Nov 15 '17 at 13:27
  • i was looking into the broadcasting algorith and the post you linked. it was wrotten there that [broadcasting may be a bad idea](https://stackoverflow.com/questions/47309818/when-broadcasting-is-a-bad-idea-numpy), I craeted another question to that subject in case you have explaination to how and when that happened – AnouarZ Nov 15 '17 at 14:23