1

I'm trying to make a shuffled copy of an array called tst. I've defined a copy of tst called batch, and then shuffled batch so that tst remains intact & unshuffled. However, in doing so, I'm finding that shuffling batch also (for some reason) shuffles tst in the process.

To fully understand my dilemma, consider the following code snippet:

# First code snippet
tst = np.array([0,1,2,3,4,5,6,7,8,9])
batch = tst
print(tst)
print(batch)
seed = 42
np.random.seed(seed)
np.random.shuffle(batch)
print(tst)
print(batch)

When I run my code, the outputs that correspond to this code snippet look like this:

[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[8 1 5 0 7 2 9 4 3 6]
[8 1 5 0 7 2 9 4 3 6]

...whereas I'd think it would look like this:

[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[8 1 5 0 7 2 9 4 3 6]

Loosely speaking, my first thought was that tst and batch are "looking" at the same location in memory (I'm not experienced in programming, so I apologize if my terminology is wrong), and so updating the particular value in that location might update any other variables "looking" at the same place. However, if this was the case, then I would assume that running the following code:

# Second code snippet
a = 5
b = a
print(a)
print(b)
a = 3
print(a)
print(b)

...would output:

5
5
3
3

However, this is not the case...Instead, it outputs:

5
5
3
5

Truth be told, the output behavior of the second code snippet is what I initially thought would happen with the first code snippet, as this seems to make much more sense to me. Performing an operation on one variable shouldn't affect any other "equal" variables, unless explicitly specified by some supplemental code or something.

Alas, I'm hoping to understand why the outputs of the first code snippet behave differently than the outputs of the second code snippet, and what needs to change so that I can shuffle batch without also shuffling tst at the same time. I've been looking around online for an answer, but I feel like everything I find is either too advanced for my current skillset, and/or simply doesn't pertain to this exact issue. Any help would be appreciated. Thanks in advance!

Jacob M
  • 147
  • 3
  • 10

1 Answers1

3

You have to use ndarray.copy, or other similar method, if you really want to create array copy. a = b just creates another variable that points to the same array. Variables are just references to "real piece of data" in python and many other languages. Sometimes a = b is a save way to create a backup for 'immutable data' such as scalars or tuples, but with mutable data types, which can be changed 'in place', that is mutated, this usually fails. Take extra care with arrays, lists, objects, dictionaries and any other "mutable" data types.

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.copy.html#numpy.ndarray.copy

Serge
  • 3,387
  • 3
  • 16
  • 34