156

I am looking for a way to remove all values within a list from another list.

Something like this:

a = range(1,10)  
a.remove([2,3,7])  
print a  
a = [1,4,5,6,8,9]  
martineau
  • 119,623
  • 25
  • 170
  • 301
ariel
  • 2,637
  • 4
  • 20
  • 12

7 Answers7

167
>>> a = range(1, 10)
>>> [x for x in a if x not in [2, 3, 7]]
[1, 4, 5, 6, 8, 9]
Remi Guan
  • 21,506
  • 17
  • 64
  • 87
YOU
  • 120,166
  • 34
  • 186
  • 219
  • 17
    What if I've a list `[1,2,2,2,3,4]` and a sublist `[2,3]`, then the result should be `[1,2,2,4]`, is there a Pythonic way to do that? – user Mar 02 '14 at 05:20
  • @user this gets you most of the way there - but your problem is a different problem! l=[1,2,2,3,4] sl=[2,3] [x for x in [l[n:n+2] for n in range(0,len(l))[::2] ] if x != sl] – jsh Mar 23 '16 at 14:21
90

I was looking for fast way to do the subject, so I made some experiments with suggested ways. And I was surprised by results, so I want to share it with you.

Experiments were done using pythonbenchmark tool and with

a = range(1,50000) # Source list
b = range(1,15000) # Items to remove

Results:

 def comprehension(a, b):
     return [x for x in a if x not in b]

5 tries, average time 12.8 sec

def filter_function(a, b):
    return filter(lambda x: x not in b, a)

5 tries, average time 12.6 sec

def modification(a,b):
    for x in b:
        try:
            a.remove(x)
        except ValueError:
            pass
    return a

5 tries, average time 0.27 sec

def set_approach(a,b):
    return list(set(a)-set(b))

5 tries, average time 0.0057 sec

Also I made another measurement with bigger inputs size for the last two functions

a = range(1,500000)
b = range(1,100000)

And the results:

For modification (remove method) - average time is 252 seconds For set approach - average time is 0.75 seconds

So you can see that approach with sets is significantly faster than others. Yes, it doesn't keep similar items, but if you don't need it - it's for you. And there is almost no difference between list comprehension and using filter function. Using 'remove' is ~50 times faster, but it modifies source list. And the best choice is using sets - it's more than 1000 times faster than list comprehension!

The Godfather
  • 4,235
  • 4
  • 39
  • 61
  • very interesting. I would not have used set, intuitively the conversion should add overhead. apparently my intuition was wrong. thanks for the insight – lhk Dec 07 '16 at 09:58
  • 4
    Very good answer, thanks! Sets are much faster because the time to find an item is linear, since a Python set is implemented as a hash table. Therefore, to remove an item of a set no time is spend locating the item, whereas in a list the item has to be found first. – Guillem Cucurull Aug 15 '18 at 21:24
  • Definitely useful! Thinking on the lines of set, makes it so natural. – Prem Aug 11 '19 at 06:30
49

If you don't have repeated values, you could use set difference.

x = set(range(10))
y = x - set([2, 3, 7])
# y = set([0, 1, 4, 5, 6, 8, 9])

and then convert back to list, if needed.

arunjitsingh
  • 2,644
  • 2
  • 22
  • 16
  • 3
    Note that this will shuffle the resulting list. – Neal Ehardt Dec 04 '13 at 00:59
  • 1
    The order of the list may change, but in a deterministic way. It is not "shuffled" in the random sense. – dansalmo Dec 25 '13 at 18:36
  • 4
    also, if your original list x has duplicates, after the set() operation, only one is saved. – fast tooth May 16 '14 at 19:34
  • @dansalmo it ends up sorted by a conjunction of values depending on the implementation of the set and of the state of various memory constraints present when creating the buckets. Quite shuffled I would say. – njzk2 Jul 30 '15 at 03:37
30
a = range(1,10)
itemsToRemove = set([2, 3, 7])
b = filter(lambda x: x not in itemsToRemove, a)

or

b = [x for x in a if x not in itemsToRemove]

Don't create the set inside the lambda or inside the comprehension. If you do, it'll be recreated on every iteration, defeating the point of using a set at all.

user2357112
  • 260,549
  • 28
  • 431
  • 505
Yaroslav
  • 2,718
  • 17
  • 16
9

The simplest way is

>>> a = range(1, 10)
>>> for x in [2, 3, 7]:
...  a.remove(x)
... 
>>> a
[1, 4, 5, 6, 8, 9]

One possible problem here is that each time you call remove(), all the items are shuffled down the list to fill the hole. So if a grows very large this will end up being quite slow.

This way builds a brand new list. The advantage is that we avoid all the shuffling of the first approach

>>> removeset = set([2, 3, 7])
>>> a = [x for x in a if x not in removeset]

If you want to modify a in place, just one small change is required

>>> removeset = set([2, 3, 7])
>>> a[:] = [x for x in a if x not in removeset]
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • @gnibbler, Your claim *"So if `a` grows very large this will end up being quite slow."* is a bit misleading. If only the length of `a` is unbounded, all of the snippets you provided are O(n). The **real** problem with `remove` is that it only removes *the first occurrence* of its arguments, not all occurrences. It is also generally more in keeping with writing clear, idiomatic code to make a new list rather than mutating the old one. – Mike Graham Mar 25 '10 at 15:55
  • @Mike, I attempted to keep the answer simple as the OP has used the beginner tag. – John La Rooy Mar 25 '10 at 16:11
  • 4
    "simple" is no excuse for *incorrect*. – Mike Graham Mar 25 '10 at 16:24
6

Others have suggested ways to make newlist after filtering e.g.

newl = [x for x in l if x not in [2,3,7]]

or

newl = filter(lambda x: x not in [2,3,7], l) 

but from your question it looks you want in-place modification for that you can do this, this will also be much much faster if original list is long and items to be removed less

l = range(1,10)
for o in set([2,3,7,11]):
    try:
        l.remove(o)
    except ValueError:
        pass

print l

output: [1, 4, 5, 6, 8, 9]

I am checking for ValueError exception so it works even if items are not in orginal list.

Also if you do not need in-place modification solution by S.Mark is simpler.

Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219
  • if you really need in-place modification, the previous answers can be modified to: `a[:] = [x for x in a if x not in [2,3,7]]`. This will be faster than your code. – Seth Johnson Mar 25 '10 at 13:35
  • 2
    yes a[:] can be used, but it not obvious that it will be faster, for long lists with few values to remove my code will be much much faster e.g. try list to remove = [1] :) – Anurag Uniyal Mar 25 '10 at 14:01
  • @Anurag: You seem to be right; timing tests make it look like removing in place is faster. – Seth Johnson Mar 25 '10 at 14:49
  • What you need to do if you want to use `remove` is loop calling `l.remove` over and over until you get `ValueError` and at that point break that loop. That would account for the case that there are multiple occurrences of a value in the list. (The better solution is still your first one, though.) – Mike Graham Mar 25 '10 at 15:57
  • @Seth Johnson, Premature optimization much? – Mike Graham Mar 25 '10 at 15:57
  • @Mike: No, but I had hoped that the cleaner (shorter one-line) version would be better. I don't use Python when I want optimized code. I use C++ with SWIG. :) – Seth Johnson Mar 25 '10 at 18:00
  • @Seth Johnson, "Better" doesn't mean fastest, especially not when a piece of code isn't proven to be slowing down an application. Criteria like correctness (which the `remove` option does not qualify for), readability, testability, and maintainability are almost always more important. – Mike Graham Mar 25 '10 at 18:28
  • The `remove` idea is great for lists containing `dict` objects. – rob Mar 09 '16 at 19:22
6
>>> a=range(1,10)
>>> for i in [2,3,7]: a.remove(i)
...
>>> a
[1, 4, 5, 6, 8, 9]

>>> a=range(1,10)
>>> b=map(a.remove,[2,3,7])
>>> a
[1, 4, 5, 6, 8, 9]
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • 1
    Don't use `map` for side effects. `map` is for collecting the result of a bunch of calls. For loops are the tool for doing something a bunch of times. – Mike Graham Mar 25 '10 at 15:58
  • 1
    if what you mean by side effects are those "none" return by `map`, then it can "masked" off. Other than that, its still valid, and i like the conciseness of it. – ghostdog74 Mar 25 '10 at 16:10