So I frequently write code following a pattern like this:
_list = list(range(10)) # Or whatever
_list = [some_function(x) for x in _list]
_list = [some_other_function(x) for x in _list]
etc
I saw now on a different question a comment that explained how this approach creates a new list each time and it is better to mutate the existing list, like so:
_list[:] = [some_function(x) for x in _list]
It's the first time I've seen this explicit recommendation and I'm wondering what the implications are:
Does the mutation save memory? Presumably the references to the "old" list would drop to zero after a re-assignment and the "old" list is disregarded, but is there a delay before that happens where I'm potentially using more memory than I need to when I use re-assignment instead of mutating the list?
Is there a computational cost to using mutation? I suspect changing something inplace is more expensive than creating a new list and just dropping the old one?
In terms of safety, I wrote a script to test this:
def some_function(number: int):
return number*10
def main():
_list1 = list(range(10))
_list2 = list(range(10))
a = _list1
b = _list2
_list1 = [some_function(x) for x in _list1]
_list2[:] = [some_function(x) for x in _list2]
print(f"list a: {a}")
print(f"list b: {b}")
if __name__=="__main__":
main()
Which outputs:
list a: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list b: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
So mutation does seem to have the drawback of being more likely to cause side effects. Although these might be desirable. Are there any PEPs that discuss this safety aspect, or other best practice guides?
Thank you.
EDIT: Conflicting Answers: So more tests on memory So I have received two conflicting answers so far. In the comments, jasonharper has written that the right hand side of an equation does not know about the left hand side, and therefore memory usage cannot possibly be affected by what appears on the left. However, in the answers, Masoud has written that "when [reassignment] is used, two new and old _lists with two different identities and values are created. Afterward, old _list is garbage collected. But when a container is mutated, every single value is retrieved, changed in CPU and updated one-by-one. So the list is not duplicated." This seems to indicate that there is a big memory cost to doing reassignment.
I decided to try using memory-profiler to dig deeper. Here is the test script:
from memory_profiler import profile
def normalise_number(number: int):
return number%1000
def change_to_string(number: int):
return "Number as a string: " + str(number) + "something" * number
def average_word_length(string: str):
return len(string)/len(string.split())
@profile(precision=8)
def mutate_list(_list):
_list[:] = [normalise_number(x) for x in _list]
_list[:] = [change_to_string(x) for x in _list]
_list[:] = [average_word_length(x) for x in _list]
@profile(precision=8)
def replace_list(_list):
_list = [normalise_number(x) for x in _list]
_list = [change_to_string(x) for x in _list]
_list = [average_word_length(x) for x in _list]
return _list
def main():
_list1 = list(range(1000))
mutate_list(_list1)
_list2 = list(range(1000))
_list2 = replace_list(_list2)
if __name__ == "__main__":
main()
Please note that I am aware that, eg, this the find average word length function isn't particularly well written. Just for testing sake.
Here are the results:
Line # Mem usage Increment Line Contents
================================================
16 32.17968750 MiB 32.17968750 MiB @profile(precision=8)
17 def mutate_list(_list):
18 32.17968750 MiB 0.00000000 MiB _list[:] = [normalise_number(x) for x in _list]
19 39.01953125 MiB 0.25781250 MiB _list[:] = [change_to_string(x) for x in _list]
20 39.01953125 MiB 0.00000000 MiB _list[:] = [average_word_length(x) for x in _list]
Filename: temp2.py
Line # Mem usage Increment Line Contents
================================================
23 32.42187500 MiB 32.42187500 MiB @profile(precision=8)
24 def replace_list(_list):
25 32.42187500 MiB 0.00000000 MiB _list = [normalise_number(x) for x in _list]
26 39.11328125 MiB 0.25781250 MiB _list = [change_to_string(x) for x in _list]
27 39.11328125 MiB 0.00000000 MiB _list = [average_word_length(x) for x in _list]
28 32.46484375 MiB 0.00000000 MiB return _list
What I found is that even if I increase the list size to like 100000, reassignment consistently uses more memory, but, like, only maybe 1% more. This makes me think that the additional memory cost is probably just an extra pointer somewhere, not the cost of an entire list.
To further test the hypothesis, I performed time based profiling at intervals of 0.00001 seconds and graphed the results. I wanted to see whether there was perhaps a momentary spike in memory usage that dissappeared instantly due to garbage collection (reference counting). But alas, I have not found such a spike.
Can anyone explain these results? What exactly is happening under the hood here that causes this slight but consistent increase in memory usage?