Numpy.copy not working as intended for random sampling

Question

I have an issue with numpy.copy not working as intended.

When using it as following, it still references the change back in my original list, which I want to avoid:

test = np.copy(random.sample(population_list, 2))
test[0][0][0][0] = 1.1111

If I print out population list after the test assignment, it does replace the value in that position of population_list with 1.1111. My goal is to sample from the list, and then make some changes to those samples, without affecting the initial list.

As further info, my population_list is a list of lists, where the first element is a numpy matrix:

print(type(population_list))
print(type(population_list[0]))
print(type(population_list[0][0]))
print(type(population_list[0][0][0][0]))

list
list
numpy.ndarray
numpy.float64

LE: This is how the data looks like. Sorry for the weird format, classes are beyond me at the moment.

So you have a list of lists, and at least one element of an inner list is a numpy array of floats. I wondered why you were repeating the `[]`, and not using something like `[0,0]`. Do you know what `random.sample` is producing? My guess is a couple of the nested lists. And what is `np.copy` copying? A list?, the array? or something else? Why all the nesting? — hpaulj, Jun 27 '16 at 02:29
Actually it's 3 levels of list nesting. `random.sample` keeps that nesting. But `np.copy` wraps it in a array, producing a 4d array, unless there's some intermediate level of object dtype array. — hpaulj, Jun 27 '16 at 02:37
We need to see the actual `population_list`, or at least a smaller version that behaves the same way. Before complaining about `np.copy` we need to understand the input. The `random.sample` part probably does not matter, since that is operating with lists. — hpaulj, Jun 27 '16 at 02:39
Short version of using all the nesting is because I don't know how to use classes yet. I'm trying to copy the initial list of nested lists. As for the sample, I'm sampling 2 of the lists (and their elements) within the overall grand list if that makes sense. Thanks for the feedback, I'll be digesting your answer shortly. I'll add a screenshot of the data in a second as well, for future reference. — Silviu Tofan, Jun 27 '16 at 12:40

hpaulj · Accepted Answer · 2016-06-27T03:11:54.990

To elaborate on my comments, I'll try to recreate your list

In [202]: x=np.arange(10.)

In [223]: x
Out[223]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [224]: ll=[[[x]]]    # a list

In [225]: ll[0]    # still a list
Out[225]: [[array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]]

In [226]: ll[0][0]   # still a list
Out[226]: [array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]

In [227]: ll[0][0][0]    # an array
Out[227]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [228]: ll[0][0][0][0]   # a float
Out[228]: 0.0

In [229]: random.sample(ll,1)  # same list nesting
Out[229]: [[[array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]]]

In [230]: y=np.copy(ll)    # a 4d array

In [231]: y
Out[231]: array([[[[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]]]])

In [232]: y.shape
Out[232]: (1, 1, 1, 10)

If ll contained sublists of different sizes, we'll get an object array

In [233]: ll.append([[2]])

In [234]: ll
Out[234]: [[[array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]], [[2]]]

In [235]: random.sample(ll,2)
Out[235]: [[[2]], [[array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]]]

In [236]: np.copy(ll)
Out[236]: 
array([[[array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]],

       [[2]]], dtype=object)  # (2,1,1) shape

Playing around with this some more. If the np.copy is a 4d array, then modifying an element of it does not modify ll or x. But if there is an intermediate object array level, then modifying y will modify ll and x. It's more like making a shallow list copy (ll[:]) as opposed to a deep copy.

In [270]: ll=[[[x]],[[1,2,3]]]

In [271]: ll
Out[271]: [[[array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]], [[1, 2, 3]]]

In [272]: y=np.copy(ll)

In [273]: y
Out[273]: 
array([[[array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]],
       [[1, 2, 3]]], dtype=object)

In [274]: y[0][0][0][0]=1

In [275]: y
Out[275]: 
array([[[array([ 1.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]],
       [[1, 2, 3]]], dtype=object)

In [276]: ll
Out[276]: [[[array([ 1.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])]], [[1, 2, 3]]]

In [277]: x
Out[277]: array([ 1.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In sum, np.copy does not preserve the structure of nested lists of lists. It tries to make an array instead. What you should be using is copy.deepcopy. That preserves the list structure, and copies values all the way down.

Thank you very much for this very detailed answer. It made sense way faster than I normally spend on something like this. If anyone stumbles upon here in the future, also refer to http://stackoverflow.com/questions/184710/what-is-the-difference-between-a-deep-copy-and-a-shallow-copy which I found quite helpful to complement @hpaulj's reply — Silviu Tofan, Jun 27 '16 at 12:51

Numpy.copy not working as intended for random sampling

1 Answers1