11

In short, I need to remove multiple items from a list according to their indexes. However, I can't use pop because it shifts the indexes (without some clumsy compensating system). Is there a way to remove multiple items simultaneously?

I have an algorithm that goes through a list, and if the conditions are right removes that item via the pop method. A problem arises seeing as this is all done in a loop. Once pop is done the list is shortened by one, displacing all the values by one. So the loop will go out of range. Is it possible to remove multiple items simultaneously, or another solution?

An example of my problem:

L = ['a', 'b', 'c', 'd']

for i in range(len(L)):
    print L
    if L[i] == 'a' or L[i] == 'c':
        L.pop(i)
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
rectangletangle
  • 50,393
  • 94
  • 205
  • 275
  • 1
    `[x for x in "abcd" if x not in [y for y in "ac"]]` – the wolf Mar 02 '11 at 05:51
  • If you remove the indexes starting from the biggest one you *can* use `list.pop`. So you could simply sort the indexes in descending order. This could be a lot faster than using filter/genexps if you have large lists and want to remove few items[e.g 5-10]. – Bakuriu Aug 28 '12 at 19:02

3 Answers3

17

Are your lists large? If so, use ifilter from itertools to filter out elements that you don't want lazily (with no up front cost).

Lists not so large? Just use a list comprehension:

 newlist = [x for x in oldlist if x not in ['a', 'c'] ]

This will create a new copy of the list. This is not generally an issue for efficiency unless you really care about memory consumption.

As a happy medium of syntax convenience and laziness ( = efficiency for large lists), you can construct a generator rather than a list by using ( ) instead of [ ]:

interestingelts = (x for x in oldlist if x not in ['a', 'c'])

After this, you can iterate over interestingelts, but you can't index into it:

 for y in interestingelts:    # ok
    print y

 print interestingelts[0]     # not ok: generator allows sequential access only
phooji
  • 10,086
  • 2
  • 38
  • 45
15

You want a list comprehension:

L = [c for c in L if c not in ['a', 'c']]

Or, if you really don't want to create a copy, go backwards:

for i in reversed(range(len(L))):
    if L[i] in ['a', 'c']:
        L.pop(i)    # del L[i] is more efficient

Thanks to ncoghlan for reversed() & phooji for del L[i] suggestions. (I decided to leave it as L.pop(i), since that's how the question was initially formulated.)

Also, as J.S. Sebastian correctly points out, going backwards is space efficient but time inefficient; most of the time a list comprehension or generator (L = (...) instead of L = [...]) is best.

Edit:

Ok, so since people seem to want something less ridiculously slow than the reversed method above (I can't imagine why... :) here's an order-preserving, in-place filter that should differ in speed from a list comprehension only by a constant. (This is akin to what I'd do if I wanted to filter a string in c.)

write_i = 0
for read_i in range(len(L)):
    L[write_i] = L[read_i]
    if L[read_i] not in ['a', 'c']:
         write_i += 1

del L[write_i:]
print L
# output: ['b', 'd']
senderle
  • 145,869
  • 36
  • 209
  • 233
  • Captain Obvious here: The second example is not how the majority of Python programmers would do this, I imagine :) – phooji Mar 02 '11 at 03:32
  • 1
    @phooji, lol, I have no doubt. But hey, I thought it was worth mentioning. – senderle Mar 02 '11 at 03:40
  • 2
    Since you're not using the return value for `L.pop(i)`, we could make this Even More Efficient by doing `del L[i]`... – phooji Mar 02 '11 at 03:50
  • 2
    @phooji: sometimes you need to make the modifications in place, in which case iterating through the indices in reverse is definitely the right way to do it – ncoghlan Mar 02 '11 at 04:02
  • 2
    @senderle: `reversed(range(len(L)))` is much easier to read than the multiple `-1` needed to create the reversed range directly. – ncoghlan Mar 02 '11 at 04:04
  • @ncoghlan: I know -- I was just poking fun at senderle, and pointing out that the backward loop is much less commonly used. – phooji Mar 02 '11 at 04:07
  • 1
    @ncoghlan, good point. That was bugging me, but for some reason using `reversed` on `range` didn't occur to me. – senderle Mar 02 '11 at 04:24
  • 1
    It annoyed me that `reversed(enumerate(L))` doesn't work, so I'll register a feature request to hopefully enable that in Python 3.3 – ncoghlan Mar 02 '11 at 06:14
  • 1
    "go backwards" with `del L[i]` is very slow for large inputs. It is `O(N**2)` due to `del L[i]` is `O(N)` for lists. http://stackoverflow.com/questions/5162991/list-manipulation-with-pop-python/5172425#5172425 – jfs Mar 02 '11 at 19:35
  • @J.F. Sebastian, but it must be more efficient than `l.pop(i)` right? That was what I took to be phooji's point. Certainly you're right that there's a space/time efficiency tradeoff between a backwards del loop and a list comprehension. – senderle Mar 02 '11 at 21:44
7

Summary

  • use list comprehension (or genexpr) to remove multiple items from a list
  • if your input is a large byte-string then use str.translate() to delete characters
  • deleting one item at a time del L[i] is slow for large lists

If items are bytes as in your example you could use str.translate():

def remove_bytes(bytestr, delbytes):
    """
    >>> remove_bytes(b'abcd', b'ac') == b'bd'
    True
    """
    return bytestr.translate(None, delbytes)

In general multiple items could be removed using slicing:

def remove_inplace_without_order(L, delitems):
    """Remove all items from `L` that are in `delitems` (not preserving order).

    >>> L = list(range(4)); remove_inplace_without_order(L, [0,2]); L
    [3, 1]
    """
    idel = len(L) # items idel.. to be removed
    for i in reversed(range(len(L))):
        if L[i] in delitems:
            idel -= 1
            L[i] = L[idel] # save `idel`-th item
    del L[idel:] # remove items all at once
    #NOTE: the function returns `None` (it means it modifies `L` inplace)

As @phooji and @senderle already mentioned list comprehension (or generator expression) is preferable in your case:

def remove_listcomp(L, delitems):
    return [x for x in L if x not in delitems]

Here's a performance comparison for L=list("abcd"*10**5); delitems="ac":

| function                     | time, msec |  ratio |
|------------------------------+------------+--------|
| list                         |       4.42 |    0.9 |
| remove_bytes                 |       4.88 |    1.0 |
| remove                       |       27.3 |    5.6 |
| remove_listcomp              |       36.8 |    7.5 |
| remove_inplace_without_order |       71.2 |   14.6 |
| remove_inplace_senderle2     |       83.8 |   17.2 |
| remove_inplace_senderle      |      15000 | 3073.8 |
#+TBLFM: $3=$2/@3$2;%.1f

Where

try:
    from itertools import ifilterfalse as filterfalse
except ImportError:
    from itertools import filterfalse # py3k

def remove(L, delitems):
    return filterfalse(delitems.__contains__, L)

def remove_inplace_senderle(L, delitems):
    for i in reversed(range(len(L))):
        if L[i] in delitems:
            del L[i]

def remove_inplace_senderle2(L, delitems):
    write_i = 0
    for read_i in range(len(L)):
        L[write_i] = L[read_i]
        if L[read_i] not in delitems:
             write_i += 1
    del L[write_i:]

remove_inplace_senderle() is slow due to it uses O(N**2) algorithm. Each del L[i] might cause all items to the right to be moved left to close the gap.

The time column in the above table includes time it takes to create a new input list (the first row) due to some algorithms modify an input inplace.

Here's timings for the same input but without creating a new list on each iteration:

 | function        | time, msec | ratio |
 |-----------------+------------+-------|
 | remove_bytes    |      0.391 |     1 |
 | remove          |       24.3 |    62 |
 | remove_listcomp |       33.4 |    85 |
 #+TBLFM: $3=$2/@2$2;%d

The table shows that itertools.ifilterfalse() doesn't provide a significant improvement over listcomp.

In general it is not worth it or even harmful to think about performance for such tasks unless a profiler proven that this code is a bottleneck and it is important for your program. But it might be useful to be aware of alternative approaches that could provide more than an order of magnitude improvement in speed.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670