-1

Ok let's try this again. I have 1 set of data. I want to make 2 copies, and then sort the copies in descending order based on different columns. Then I want to get the cumulative sum of the respective columns. When I run the following code I get different results for the two instances I call on print (setA[x][2]).

set = [[2,2,0],[1,3,0],[3,1,0]]

def getkey_setA (item):
    return item[0]
setA = sorted(set, key=getkey_setA, reverse=True)

def getkey_setB (item):
    return item[1]
setB = sorted(set, key=getkey_setB, reverse=True)

setA[0][2] = setA[0][0]
setB[0][2] = setB[0][1]

for x in range(1, 3):
    setA[x][2] = setA[x-1][2] + setA[x][0]
    print(setA[x][2])

for x in range(1, 3):
    setB[x][2] = setB[x-1][2] + setB[x][1]

for x in range(1, 3):
    print (setA[x][2])

This produces:

5
6
8
6

but I expected it to produce

5
6
5
6

instead.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Jellybeard
  • 43
  • 5

1 Answers1

1

sorted() creates a shallow copy of the sequence being sorted. This means that your nested lists are not copied, they are merely referenced:

>>> set = [[2,2,0],[1,3,0],[3,1,0]]
>>> setA = sorted(set, key=getkey_setA, reverse=True)
>>> setB = sorted(set, key=getkey_setB, reverse=True)
>>> setA[0] is set[2]
True
>>> setB[2] is set[2]
True
>>> setA[0] is setB[2]
True

So the last element in set is exactly the same object as setA[0] and setB[2]. Making changes to any one of those references is reflected in the others:

>>> setA[0][2]
0
>>> setA[0][2] = 42
>>> setB[2]
[3, 1, 42]
>>> set[2]
[3, 1, 42]

This is why the set object (from which you produced your sorted setA and setB lists) is also changed after running your code:

>>> set
[[2, 2, 8], [1, 3, 6], [3, 1, 9]]

You need to create a proper copy of the nested lists; you could use the copy.deepcopy() function to create a recursive copy of the list objects, or you could use a generator expression when sorting:

setA = sorted((subl[:] for subl in set), key=getkey_setA, reverse=True)
setB = sorted((subl[:] for subl in set), key=getkey_setB, reverse=True)

This shallowly copies the nested lists; this is fine because those nested lists only contain immutable objects themselves.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Ok thank you, I get it. Could I also 1) sort the data for setA, 2) record my results in the output file, 3) sort the data for set B, 4) record my results in the output file. This way I avoid having multiple copies of set, which in my real life problem has millions of entries. – Jellybeard Nov 01 '16 at 23:36
  • @Jellybeard: Perhaps you want to use [`itertools.accumulate()`](https://docs.python.org/3/library/itertools.html#itertools.accumulate) to produce the accumulated data instead? That'd avoid using the 3rd element as a variable and lets you keep sharing the ordering. Or better yet, use the `pandas` project to do the sorting and accumulation for you, it is likely far more efficient at this task. – Martijn Pieters Nov 02 '16 at 08:32