2

After I apply an operation to a list, I would like to get access to both the modified list and the original one. Somehow I am not able to.

In the following code snippet, I define two functions with which I modify the original list. Afterwards, I get my values from a class and apply the transformation.

def get_min_by_col(li, col):                          # get minimum from list
    return min(li, key=lambda x: x[col - 1])[col - 1]

def hashCluster(coords):                              # transform to origin
    min_row = get_min_by_col(coords,0)
    min_col = get_min_by_col(coords,1)
    for pix in coords:
        pix[1] = pix[1] - min_row
        pix[0] = pix[0] - min_col
    return (coords)


pixCoords = hashCoords = originalPixCoords = []       # making sure they are empty
for j in dm.getPixelsForCluster(dm.clusters[i]):
    pixCoords.append([j['m_column'], j['m_row']])     # getting some values from a class -- ex:   [[613, 265], [613, 266]]   or    [[615, 341], [615, 342], [616, 341], [616, 342]] 
originalPixCoords = pixCoords.copy()                  # just to be safe, I make a copy of the original list
print ('Original : ', originalPixCoords)              
hashCoords = hashCluster(pixCoords)                   # apply transformation
print ('Modified : ', hashCoords)
print ('Original : ', originalPixCoords)              # should get the original list

Some results [Jupyter Notebook]:

Original :  [[607, 268]]
Modified :  [[0, 0]]
Original :  [[0, 0]]

Original :  [[602, 264], [603, 264]]
Modified :  [[0, 0], [1, 0]]
Original :  [[0, 0], [1, 0]]

Original :  [[613, 265], [613, 266]]
Modified :  [[0, 0], [0, 1]]
Original :  [[0, 0], [0, 1]]

Is the function hashCluster able to modify the new list as well? Even after the .copy()?

What am I doing wrong? My goal is to have access to both the original and modified lists, with as less operations and copies of lists as possible (since I am looping over a very large document).

nyw
  • 195
  • 2
  • 11

4 Answers4

1

use

import copy
OriginalPixCoords= copy.deepcopy(pixCoords)
Igor Rivin
  • 4,632
  • 2
  • 23
  • 35
  • Thank you! This does work indeed. Is there any way I can overcome copying this? As I will have a large number of iterations and each time I will play with two rather large lists. – nyw May 16 '20 at 21:57
  • @nyw Well, you do want both the old and the new lists, so one way or another you have to copy. – Igor Rivin May 16 '20 at 22:49
1

What you're using is a shallow copy. It effectively means you created a new list and just pointed to the old memory spaces. Meaning if those object got modified, your new list will still reflect those updates since they occurred in the same memory space.

>>> # Shallow Copy
>>> mylist = []
>>> mylist.append({"key": "original"})
>>> mynewlist = mylist.copy()
>>> mynewlist
[{'key': 'original'}]
>>> mylist[0]["key"] = "new value"
>>> mylist
[{'key': 'new value'}]
>>> mynewlist
[{'key': 'new value'}]

>>> # Now Deep Copy
>>> mylist = []
>>> mylist.append({"key": "original"})
>>> from copy import deepcopy
>>> mynewlist = deepcopy(mylist)
>>> mynewlist
[{'key': 'original'}]
>>> mylist[0]["key"] = "new value"
>>> mylist
[{'key': 'new value'}]
>>> mynewlist
[{'key': 'original'}]

Another similar question: What is the difference between shallow copy, deepcopy and normal assignment operation?

Kamori
  • 356
  • 1
  • 9
  • How will making a copy influence the speed of the program, since I will iterate this over a very large document and making these copies quite a few times? Can I overcome this without a deepcopy in any efficient way? – nyw May 16 '20 at 21:56
1

Settings multiple variables equal to the same value is the equivalent of a pointer in Python.

Check this out

a = b = [1,2,3]
a == b # True
a is b    # True (same memory location)
b[1] = 3
print(b)  # [1,3,3]
print(a)  #[1,3,3]

Right now, you are creating shallow copies. If you need both copies (with different values and data history), you can simply assign the variables in the following manner:

import copy

original = data
original_copy = copy.deepcopy(data)
original_copy == original == data # True
original_copy is original   # False
original_copy[0] = 4
original_copy == original  # False
justahuman
  • 607
  • 4
  • 13
  • So, in Python, variables are references!! I actually thought they are independent, C++ -like. Is this the case with lists alone? It seems that after I convert my data to a tuple, the behavior is different. It keeps account of the old vs new. – nyw May 16 '20 at 22:05
  • This applies to all mutable data structures. Tuples are immutable and hashed (though their values inside may change) so an entirely new tuple is created. – justahuman May 17 '20 at 02:38
1

You have a list of lists, and are modifying the inner lists. The operation pixCoords.copy() creates a shallow copy of the outer list. Both pixCoords and originalPixCoords now have two list buffers pointing to the same mutable objects. There are two ways to handle this situation, each with its own pros and cons.

The knee-jerk method that most users seem to have is to make a deep copy:

originalPixCoords = copy.deepcopy(pixCoords)

I would argue that this method is the less pythonic and more error prone approach. A better solution would be to make hashCluster actually return a new list. By doing that, you will make it treat the input as immutable, and eliminate the problem entirely. I consider this more pythonic because it reduces the maintenance burden. Also, conventionally, python functions that return a value create a new list without modifying the input while in-place operations generally don't return a value.

def hashCluster(coords):
    min_row = get_min_by_col(coords, 0)
    min_col = get_min_by_col(coords, 1)
    return [[pix[0] - min_col, pix[1] - min_row] for pix in coords]
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264