How do I remove the same items from a list? Say
A= [1, 2, 3, 8, 7, 8, 8, 7, 6]
And I want to remove all 8 how should I do this? if the number is not in order?
How do I remove the same items from a list? Say
A= [1, 2, 3, 8, 7, 8, 8, 7, 6]
And I want to remove all 8 how should I do this? if the number is not in order?
The easy way to do this is to build a new list, containing all of the items that aren't 8
. For example:
B=[item for item in A if item != 8]
If you really want to do it in-place (for example, because some other object has a reference to the same list as A
, and you want that other object to see the changes), you can. But it's trickier. For example, if you delete by indices, you will have to go backward at some point (because when you remove the first 8
, all of the later 8
s have new indices, so you always have to delete the last one first):
indices = [index for index, item in enumerate(A) if item==8]
for index in reversed(indices):
del A[index]
Or you could just keep trying to remove
until it fails, but this is both ugly and slow:
while True:
try:
A.remove(8)
except ValueError:
break
In fact, you're often better off still creating a new list, and then just mutating A
into a copy of that new list:
A[:]=[item for item in A if item != 8]
For a performance test, I ran this code against all of the proposed answers. The algorithms tested are:
Note that if you don't actually need to mutate A
—and usually you don't, you can just rebind the name—the copy-then-mutate algorithms become even simple and faster. But I didn't test that here.
I also didn't fully test the "while True: remove until error" algorithm, or Chris Barker's suggested variation in the comments, because they're obviously going to be so much slower that it's not worth testing. However, a very quick test shows that the variation is about 2x slower, as expected, and that both are orders of magnitude slower than anything else in almost any test case.
Anyway, the test is removing 0s from a random list of length 100K or 1M (at 1/10th as many reps), with values ranging from 0 up to 16, 256, or 65536 (the fewer distinct values, the higher the percentage of hits to remove).
If you're using CPython, the listcomp version is always fastest, especially so when N is large. In you're using PyPy, the in-place algorithms can beat it when N is huge and M is tiny, but in that case, they're all very fast. The fancy slide algorithm eliminates the quadratic behavior of the other in-place algorithms, but it also slows things down for the simple cases, so there's really no place where it's a clear winner. (It's also by far the most complicated—which is probably why it was the only one that wasn't right on the first attempt.) If you're absolutely sure you're only going to be removing a tiny number of copies, and you're using PyPy, consider the whiledel solution; in any other use case, or when you don't know the use case for sure, I'd use the listcomp.
64-bit python CPython 3.3.0
16 values 256 values 65536 values
100K 1000K 100K 1000K 100K 1000K
revind 0.188 17.3 0.085 1.23 0.074 0.080
whiledel 0.324 19.3 0.206 1.36 0.199 0.203
slide 0.091 0.54 0.097 0.54 0.095 0.538
genepxr 0.094 0.11 0.100 0.11 0.099 0.108
listcomp 0.070 0.08 0.073 0.08 0.071 0.079
filterer 0.081 0.09 0.080 0.09 0.835 0.088
64-bit python CPython 2.7.2
16 values 256 values 65536 values
100K 1000K 100K 1000K 100K 1000K
revind 0.198 17.1 0.089 1.23 0.088 0.955
whiledel 0.345 19.8 0.233 1.36 0.234 0.243
slide 0.095 0.54 0.099 0.55 0.095 0.551
genepxr 0.092 0.11 0.097 0.11 0.107 0.116
listcomp 0.091 0.09 0.099 0.08 0.105 0.114
filterer 0.122 0.23 0.132 0.09 0.135 0.150
64-bit python PyPy 1.9.0 (Python 2.7.2)
16 values 256 values 65536 values
100K 1000K 100K 1000K 100K 1000K
revind 0.266 28.5 0.027 1.97 0.018 0.013
whiledel 0.281 30.2 0.023 1.94 0.034 0.009
slide 0.022 0.39 0.015 0.022 0.006 0.018
genepxr 0.089 0.13 0.087 0.154 0.089 0.147
listcomp 0.052 0.08 0.057 0.073 0.052 0.073
filterer 0.054 0.07 0.053 0.078 0.048 0.074
It's a bit hard to predict the performance of slide. In large-N/small-M cases, you'd expect it to blow whiledel away, but it's actually slower. But if you think about it: that algorithm effectively replaces M linear moves with N simple copies. While the former is O(NM) and the latter is O(N), the multiplier on looping in C (or, even better, inside memmove
) is so much smaller than looping in Python that you can't ignore it unless N/M is huge (at which point all of the solutions are so fast that it scarcely matters). So, doing M Python loops and NM C loops can easily beat doing N Python loops.
Most of these answers suggest copying data. This can be undesired if your list is sufficiently large. You can easily use
while 8 in A: A.remove(8)
to do it without copying any data. However, this runs in quadratic time, which is also undesirable if your list is large. To do it in linear time and without copying any data, use:
def remove_all_from_list(L, n):
i = 0
while i < len(L):
if L[i] == n:
del L[i] # Do not increment i here, because L[i] will change
else:
i += 1
>>> A = [1, 2, 3, 8, 7, 8, 8, 7, 6]
>>> remove_all_from_list(A, 8)
>>> A
[1, 2, 3, 7, 7, 6]
Edit: @abarnert reminded me that del L[i] is O(N) so this is actually quadratic. Here's another attempt at an O(N) in-place solution...
def remove_all_from_list(L, n):
# indices takes *worse-case* O(N) space, but typical-case much less
indices = [i for i, x in enumerate(L) if x==n]
indices_seen = 0
num_indices = len(indices)
for i in xrange(len(L)):
while (indices_seen < num_indices and
i + indices_seen == indices[indices_seen]):
indices_seen += 1
if i+indices_seen >= len(L):
break
L[i] = L[i+indices_seen]
L[-indices_seen:] = []
This will do all the shuffling at the end, so each element gets moved at most once. I realize this will take exactly as much time as abarnert's copying method. I'm just trying to think of ways to reduce memory usage in case you have a very large list.
Final edit: Speed tests (not as comprehensive as @abarnert's)
import random
L = range(30)*10000
random.shuffle(L)
from copy import copy
for fn in [remove_1, remove_2, remove_3, remove_4, remove_5, remove_6]:
print fn.__name__
%timeit fn(copy(L), 8)
remove_1 # listcomp
10 loops, best of 3: 39.1 ms per loop
remove_2 # revind
1 loops, best of 3: 1.7 s per loop
remove_3 # try: remove; except: break
1 loops, best of 3: 65.7 s per loop
remove_4 # while n in L: L.remove(n)
1 loops, best of 3: 129 s per loop
remove_5 # whiledel
1 loops, best of 3: 1.87 s per loop
remove_6 # slide
1 loops, best of 3: 227 ms per loop
In [13]: A= [1, 2, 3, 8, 7, 8, 8, 7, 6]
In [14]: [i for i in A if i!=8]
Out[14]: [1, 2, 3, 7, 7, 6]
In [15]: filter(lambda i: i!=8, A)
Out[15]: [1, 2, 3, 7, 7, 6]