1

This post https://stackoverflow.com/a/5541452/6394617

suggests a way to make a Numpy array immutable, using .flags.writeable = False

However, when I test this:

arr = np.arange(20).reshape((4,5))
arr.flags.writeable = False
arr

for i in range(5):
    np.random.shuffle(arr[:,i])

arr

The array is shuffled in place, without even a warning.

QUESTION: Is there a way to make the array immutable?

BACKGROUND:

For context, I'm doing machine learning, and I have feature arrays, X, which are floats, and label arrays, y, which are ints.

I'm new to Scikit-learn, but from what I've read, it seems like the fit methods shuffle the arrays in place. That said, when I created two arrays, fit a model to the data, and inspected the arrays afterwards, they were in the original order. So I'm just not familiar with how Scikit-learn shuffles, and haven't been able to find an easy explanation to that online yet.

I'm using many different models, and doing some preprocessing in between, and I'm worried that at some point my two arrays may get shuffled so that the rows no longer correspond appropriately.

It would give me piece of mind if I could make the arrays immutable. I'm sure I could switch to tuples instead of Numpy arrays, but I suspect that would be more complicated to code and slower.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
Joe
  • 662
  • 1
  • 7
  • 20
  • 1
    I'm going to mess up the terminology, but `arr[:, i]` returns something like a "view" of the data, not the array itself. `np.random.shuffle(x)` will throw an error – Paul H Dec 17 '21 at 14:57
  • scikit-learn's `fit` shouldn't shuffle the columns. If it shuffles anything, it should do the whole row. – Quang Hoang Dec 17 '21 at 15:26
  • @PaulH, I noticed the same thing. ```np.random.shuffle(arr)``` throws an error, but the code in my post does not. I am trying to find out if there is a way to make ```arr``` immutable, so that nothing can change it. Perhaps there is not. Perhaps I have not even framed the question correctly, since I know that a tuple that contains mutable elements can have its mutable elements mutated. But the bottomline is that I want to know if there is a way that I can prevent the array from changing at all. – Joe Dec 17 '21 at 16:12
  • 1
    @QuangHoang, I know that scikit-learn shuffles by default (rows, not columns), but I was surprised when I called ```X.flags.writeable = False``` before ```clf.fit(X,y)``` and did not cause any errors, since it seemed to me like ```fit``` was going to try to shuffle the data in place, but should not have been able to. So I'm not sure how the scikit-learn library shuffles the data. I haven't dug through every line of source code, and don't really have time to, which is why I was hoping there was some way to just lock the array, in a way that prevented ***any*** changes to it. – Joe Dec 17 '21 at 16:22
  • What I'm saying is that `arr` is immutable, but `arr[:, i]` is a view, and therefore you can do anything to it. If you want each individual row to be immutable, make a list of immutable rows – Paul H Dec 17 '21 at 16:26
  • 3
    The problem is not that `arr[:, i]` is a view, but that it is a one-dimensional array. It looks like the `shuffle` method does not respect the `writeable` flag when the input is a 1-d array. E.g. `x = np.arange(5); x.flags.writeable = False; np.random.shuffle(x)` succeeds. This might be a bug in the `shuffle` method. – Warren Weckesser Dec 17 '21 at 16:50
  • FYI: This has been [fixed](https://github.com/numpy/numpy/pull/20621) in the numpy development version. – Warren Weckesser Dec 20 '21 at 20:42
  • 1
    @WarrenWeckesser, that's great, thanks! Do you want to post that as an answer, so that if anyone has this question in the future, they will see that they just need to make sure to have the latest version of NumPy? – Joe Dec 21 '21 at 14:46

1 Answers1

1

This is a bug in numpy.random.shuffle in numpy versions 1.22 and earlier. The function does not respect the writeable flag of the input array when the array is one-dimensional.

numpy.random.Generator.shuffle has the same issue, and numpy.random.Generator.permuted fails to respect the writeable flag for arrays of any dimension.

This has been fixed in the main development branch of NumPy, so NumPy versions 1.23.0 and later will not have this bug. Note that NumPy 1.22.0 has not been released yet, but is available as a release candidate. The fix occurred after the branching of 1.22, so the fix will not be in 1.22.0.

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214