Python Shuffling Array Inadvertently

Question

I am trying to do a simple 1D-array shuffle in Python but keeping a copy of the original array. However, when calling shuffle commands (either np.random.shuffle or random.shuffle) Python will shuffle all of them in sync.

Example:

import numpy as np
arr = np.arange(10)
arr_backup = arr
print(arr)
print(arr_backup)
np.random.shuffle(arr)
print(arr)
print(arr_backup)

This prints:

[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[1 4 9 5 8 6 3 2 7 0]
[1 4 9 5 8 6 3 2 7 0]

I guess I am not understanding how Python allocates this item in the namespace or something. Any help is appreciated. Thanks.

```arr``` and ```arr_backup``` reference the same object in memory. just like with lists, dicts, etc, you have to explicitly copy the object to a new variable. What you've essentially done here is aliased the variable ```arr``` to ```arr_backup```. Modifying one will modify the other. You can test this by replacing ```np.random.shuffle(arr)``` with ```np.random.shuffle(arr_backup)```. — HelpfulHound, Jan 13 '20 at 21:35
The assignment `arr_backup = arr` creates another name *for the same array*. When you then shuffle it, both names point to the same shuffled one. If you want a new copy of the array, try `arr_backup = arr.copy()`. — Lee Daniel Crocker, Jan 13 '20 at 21:37
So why doesnt this change b to number 3 as well? In [105]: a=2 In [106]: b=a In [107]: a=3 In [108]: b Out[108]: 2 — user191919, Jan 13 '20 at 21:39
@user191919 because **you never change the `int` object** (you can't anyway, because `int` objects are immutable. You simply assign a new object to the name `a`, namely `3`. The same thing works with `numpy.ndarray` objects, so `arr = np.arange(10); arr_backup = arr; arr = np.arange(20); print(arr_backup)` — juanpa.arrivillaga, Jan 13 '20 at 22:00
@user191919 read the following: https://nedbatchelder.com/text/names.html — juanpa.arrivillaga, Jan 13 '20 at 22:02

kmaork · Accepted Answer · 2020-01-13T21:53:35.177

3

All variables in Python hold references to objects. Assignment from one variable to another just copies that reference, so both arr and arr_backup point to the same object in memory.

Shuffle mutates the array in-place, so the changes are reflected when you access the object from both references. To avoid that, you can copy the array by using arr_backup = arr[:] or arr_backup = arr.copy()

edited Jan 13 '20 at 21:53

answered Jan 13 '20 at 21:38

kmaork

5,722
2
23
40

1

You don't want to call lost because that'll drop the array. You want `arr.copy()` here – roganjosh Jan 13 '20 at 21:44
You're absolutely right :) – kmaork Jan 13 '20 at 21:53

Yatish Kadam · Answer 2 · 2020-01-13T21:41:41.950

0

direct reference from the np docs.

x = np.array([1, 2, 3])

y = x

z = np.copy(x)

Note that, when we modify x, y changes, but not z:

x[0] = 10

x[0] == y[0] True

x[0] == z[0] False

edited Jan 13 '20 at 21:41

answered Jan 13 '20 at 21:36

Yatish Kadam

454
2
11

Python Shuffling Array Inadvertently

2 Answers2