1

Say I have a considerably large list my_list in python, and I want to truncate it. I can achieve it by deleting it or by assigning a new list to my_list. What is the better approach?

my_list = range(1, 10000)

method 1:

my_list = list() 
print len(my_list) # prints 0

method 2:

del my_list[:] 
print len(my_list) # prints 0

I feel that method2 is more appropriate way of going about doing this, am I right?

Anuvrat Parashar
  • 2,960
  • 5
  • 28
  • 55
  • 1
    why you want to do it ? – sateesh Jun 15 '13 at 08:15
  • 1
    I think you are working from C-style assumptions. In C, memory leaks are easy to create if you are careless. In Python, memory leaks are very hard to create, the garbage collector collects any unreferenced objects each time it triggers. You should not think at all about handling memory allocation yourself, this is unnecessary and not how Python works. Thinking of 'what is more efficient' is enough. – kampu Jun 15 '13 at 08:21
  • @kampu The [Python garbage collector](http://docs.python.org/2/library/gc.html) is an _optional_ run-time component of Python. Its main use is to collect unreachable objects -- mainly due of circular dependencies. In normal usage, the _reference counting_ mechanism is used to dellocate objects memory. – Sylvain Leroux Jun 15 '13 at 08:32
  • @SylvainLeroux: That's actually why I said 'garbage collection' -- as a reference to .. reference counting. However, it turns out that 'reference counting' is not as GC-like as I thought (in particular, deallocating at the exact time reference counts drop below 1, rather than batching them up,), so I concede the point – kampu Jun 15 '13 at 08:40
  • There are two related questions: http://stackoverflow.com/questions/1400608/how-to-empty-a-list-in-python and http://stackoverflow.com/questions/850795/clearing-python-lists – doctorlove Jun 15 '13 at 09:49

2 Answers2

6

Internally, Python use a mechanism called reference counting to keep track if a data is still accessible or not. Each time a new "variable" references a data, the reference counter of the data is increased. Each time a "variable" cease to reference a data, the reference counter of the data is decremented. When the reference counter reach 0, the data is deleted (its "deallocation function" is invoked): http://docs.python.org/2/c-api/refcounting.html

For example, this create a "big" list, which is deleted almost as soon as it is created since there is no variable to "increase" its reference counter:

range(1, 10000)

This create a new list, allow you to reference it through my_list and set the reference counter of the list to "1"

my_list = range(1, 10000)

Writing the following statement, will now decrease the reference counter of the list. Assuming you have no other references to it, that counter reach 0 and so the list is deleted.

my_list = None

A last example:

my_list = range(1, 10000)
del my_list[:]

This one create a list of 10000 items. With a reference counter of "1". The second statement delete the 10000 items of the list -- but you still have one reference to an empty list. You see the difference?


BTW, reference counting is a great mechanism for automatic deallocation and it has the benefit of being determinist (as the opposite of the Java garbage collector). But, there is one case where reference counting does not work: if you have circular dependencies. Object A references object B which references object A. In that case, none of the A or B reference counter could reach 0 as long as the "circle" is not broken. But this is beyond your question, I assume. Anyway, for those programs containing non-mastered circular dependencies, Python has an optional garbage collector to free such cycles. By default that garbage collector is enabled. It's easy to check:

>>> import gc
>>> gc.isenabled()
True

As a final note, even that garbage collector is limited since it does not deallocate cycles containing objects with finalizer (__del__). See the following link for rational about that http://arctrix.com/nas/python/gc/

Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
  • So it makes no difference whether I assign a new list or use delete. Thanks. – Anuvrat Parashar Jun 15 '13 at 09:06
  • Note that the *optional* garbage collector is in fact used automatically and functions quite nicely without the user ever importing the `gc` module. (You're probably aware of this, but it's not obvious from the answer's text.) Circular references are easily created and a system that doesn't dispose of them is not very useful. – user4815162342 Jun 15 '13 at 10:32
  • @user4815162342 I've update my answer to explicitly state that the optional garbage collector is enabled by default. – Sylvain Leroux Jun 15 '13 at 17:21
1

Deleting is what i prefer, as both the operations take nearly the same time. And del clears any memory reference associated with the list and assigning it to a new list may result in the old memory not being cleared properly.

Aswin Murugesh
  • 10,831
  • 10
  • 40
  • 69
  • 1
    This is vague : "may result in the old memory not being cleared properly.". Do you mean it failing to be garbage collected? Because its actual bytes will not be cleared, regardless, until that piece of memory is allocated to another object (list, tuple, whatever), which then initializes it. – kampu Jun 15 '13 at 08:24