0

I have a very large list on each element of which I have to do many operations. Essentially, each element of the list is appended to in various ways and then used to generate an object. These objects are then used to generate another list.

Unfortunately, doing this in a naive way takes up all of available memory.

I would therefore like to do the following:

for a in b:
    # Do many things with a
    c.append(C(modified_a))
    b[b.index(a)] = None # < Herein lies the rub

This seems to violate the idea that a list should not be modified during iteration. Is there a better way to do this kind of manual garbage collecting?

Sandeep Raju Prabhakar
  • 18,652
  • 8
  • 35
  • 44
astex
  • 1,045
  • 10
  • 28
  • Please explain downrating in the comments, otherwise it doesn't help me pose better questions in the future. – astex Feb 14 '13 at 16:29

3 Answers3

2

This shouldn't be a problem, since you're just assigning new values to list elements, not really deleting them.

But instead of searching for a with the index method, you should probably use enumerate.

See also here: http://unspecified.wordpress.com/2009/02/12/thou-shalt-not-modify-a-list-during-iteration/ "Firstly, let me be clear that in this article, when I say “modify”, I mean inserting or removing items from the list. Merely updating or mutating the list items is fine."

Sebastian
  • 1,839
  • 12
  • 16
0

Your best bet is a generator:

def gen(b):
   for a in b:
      # Do many things with a
      yield a

Done properly here, no additional memory required.

dawg
  • 98,345
  • 23
  • 131
  • 206
  • This would work great for memory management if `b` did not have to be fully populated before I could start generating `c`. Perhaps I am confused, but this would, as stated, still require a full copy of `b` and `c` in memory, yes? – astex Feb 14 '13 at 16:55
  • Depends what you are doing with `c`. If `c` is only being used elsewhere as an interator, then this works great and no full copy of a big list needs to be made. If you need to subscript `c` like `c[n]` then you need the list. It would be better, if memory is an issue, to rethink your overall approach so that generators can be used throughout the program -- if possible. If not, don't use `b[b.index(a)]` since that is very slow as Pyson pointed out. – dawg Feb 14 '13 at 17:36
  • `c` is the final product. It must be returned as a list. In addition, the code to be run on `a` depends on an instance of Selenium and is therefore incredibly slow. As I said, this is not a major time bottleneck (searching via index certainly is not). I only need this to be memory efficient. It should also be noted that Sebastian mentioned using enumerate before Pyson. – astex Feb 14 '13 at 17:50
  • `c is the final product. It must be returned as a list` OK, do what your doing, and it does not matter if you change list elements while iterating -- only the list length. `It should also be noted that Sebastian mentioned using enumerate before Pyson.` I am confused - Why are you telling me this? I am referring to times in Pyson's post; not trying to get into any dispute y'all may have. – dawg Feb 14 '13 at 18:37
-1

There are several issues with your code.

First, assigning None to a list element does not delete it:

>>> l=[1,2,3,4,5,6,6,7,8,9]
>>> len(l)
10
>>> l[l.index(5)]=None
>>> l
[1, 2, 3, 4, None, 6, 6, 7, 8, 9]
>>> len(l)
10

Second, using an index to find the element that you want to change is not at all efficient way to do this.

You can use enumerate, but you would still need to loop through to delete the None values.

for i,a in enumerate(b):
    # Do many things with a
    b[i]=C(modified_a)
    b[i]=None 
c=[e for e in b if e is not None]

You could use a list comprehension to just copy the new 'a' values to the c list then delete b:

c=[do_many_things(a) for a in b]
del b                              # will still occupy memory if not deleted...

Or if you want b to be modified in place, you can use slice assignment:

b[:]=[do_many_things(a) for a in b]

Slice assignment works this way:

#shorted a list
>>> b=[1,2,3,4,5,6,7,8,9]
>>> b[2:7]=[None]
>>> b
[1, 2, None, 8, 9]

#expand a list
>>> c=[1,2,3]
>>> c[1:1]=[22,33,44]
>>> c
[1, 22, 33, 44, 2, 3]

# modify in place
>>> c=[1,2,3,4,5,6,7]
>>> c[0:7]=[11,12,13,14,15,16,17]
>>> c
[11, 12, 13, 14, 15, 16, 17]

You can use it in a list comprehension like so:

>>> c=list(range(int(1e6)))
>>> c[:]=[e for e in c if e<10]
>>> c
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

One of the comments pointed out that slice assignment does not modify in place exactly; that a temp list is generated. That is true. However, let's look at the total timings here:

import time
import random
fmt='\t{:25}{:.5f} seconds' 
count=int(1e5)
a=[random.random() for i in range(count)]
b=[e for e in a]

t1=time.time()
for e in b:
    if e<0.5: b[b.index(e)]=None  
c=[e for e in b if e is not None]    
print(fmt.format('index, None',time.time()-t1))

b=[e for e in a]
t1=time.time()
for e in b[:]:
    if e<0.5: del b[b.index(e)]  
print(fmt.format('index, del',time.time()-t1))

b=[e for e in a]
t1=time.time()
for i,e in enumerate(b[:]):
    if e<0.5: b[i]=None
c=[e for e in b if e is not None]    
print(fmt.format('enumerate, copy',time.time()-t1))

t1=time.time()
c=[e for e in a if e<.5]
del a
print(fmt.format('c=',time.time()-t1))

b=[e for e in a]
t1=time.time()
b[:]=[e for e in b if e<0.5]
print(fmt.format('a[:]=',time.time()-t1))

On my computer, prints this:

index, None              87.30604 seconds
index, del               28.02836 seconds
enumerate, copy          0.02923 seconds
c=                       0.00862 seconds
a[:]=                    0.00824 seconds

Or, use numpy for more optimized array options if this does not help.

Community
  • 1
  • 1
  • Can you explain what this [:] is good for? I'd think this will create an intermediate list (the result of the right hand side of the assignment) and afterwards, it doesn't make much difference if you assign to b or to b[:]. – Sebastian Feb 13 '13 at 20:16
  • 2
    I'm no expert here, but according to this answer, you're wrong: http://stackoverflow.com/a/4948508/1413374 – Sebastian Feb 13 '13 at 20:29
  • The list comprehension does create a new list. The slice assignment then copies that list's contents into the list referenced by `b`. –  Feb 13 '13 at 20:30
  • I have no intention of deleting the list element. In fact, someone else's code within the same project later requires that the list is of the same length, but with the processed values replaced with None. This answer has also turned into a bit of a flame war. Please note also that I do not care about computation time, but, as stated in the question, memory usage. Creating a second copy of `b` is out of the question. – astex Feb 14 '13 at 16:20
  • @astex: If you have 'no intention of deleting the list element' why is the title of your question 'Deleting elements of a python list during iteration'? Please also explain 'flame war'? I don't think I am trying to flame anybody for anything. Four of the 5 solutions I gave you do NOT involve creation of any temp list at all. `list[:]` does, so don't use that. Sorry if there is some confusion over intent -- I am just trying to help out. –  Feb 15 '13 at 15:11