303

I need to compare two lists in order to create a new list of specific elements found in one list but not in the other. For example:

main_list = []
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"] 

I want to loop through list_1 and append to main_list all the elements from list_2 that are not found in list_1.

The result should be:

main_list = ["f", "m"]

How can I do it with Python?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
CosimoCD
  • 3,570
  • 3
  • 22
  • 31
  • 2
    Are you looking for elements in `list_2` that appear nowhere in `list_1` or elements in `list_2` that are not present at the same index in `list_1`? – Patrick Haugh Dec 13 '16 at 16:30

10 Answers10

425

You can use sets:

main_list = list(set(list_2) - set(list_1))

Output:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']

Per @JonClements' comment, here is a tidier version:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']
nrlakin
  • 5,234
  • 3
  • 16
  • 27
  • 4
    This is good if we only care about `unique` elements but what if we have multiple `m's` for example this would not pick it up. – Chinny84 Dec 13 '16 at 16:28
  • That's true. I assumed the poster was looking for unique elements. I suppose it depends on what he means by "specific". – nrlakin Dec 13 '16 at 16:31
  • Indeed p.s. I did not down vote your answer, especially for an unclear original question. – Chinny84 Dec 13 '16 at 16:32
  • 17
    You could write this as `list(set(list_2).difference(list_1))` which avoids the explicit `set` conversion... – Jon Clements Dec 13 '16 at 16:32
  • No worries! Thanks @leaf for the formatting assist. – nrlakin Dec 13 '16 at 16:33
  • @JonClements that is a bit cleaner; I will add. – nrlakin Dec 13 '16 at 16:38
  • so `set() - set()` basically removes any value from the former `set()` that is in the latter `set()`, reduces the output to unique values only, and then spits it out as a dictionary? – oldboy May 23 '21 at 22:52
  • Personnally you should go with built-in '.difference' function. The following link provides interesting examples: https://betterprogramming.pub/a-visual-guide-to-set-comparisons-in-python-6ab7edb9ec41 – Greg7000 Jul 03 '23 at 13:47
201

TL;DR:
SOLUTION (1)

import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`

SOLUTION (2) You want a sorted list

def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans
main_list = setdiff_sorted(list_2,list_1)




EXPLANATIONS:
(1) You can use NumPy's setdiff1d (array1,array2,assume_unique=False).

assume_unique asks the user IF the arrays ARE ALREADY UNIQUE.
If False, then the unique elements are determined first.
If True, the function will assume that the elements are already unique AND function will skip determining the unique elements.

This yields the unique values in array1 that are not in array2. assume_unique is False by default.

If you are concerned with the unique elements (based on the response of Chinny84), then simply use (where assume_unique=False => the default value):

import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"] 
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`


(2) For those who want answers to be sorted, I've made a custom function:

import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans

To get the answer, run:

main_list = setdiff_sorted(list_2,list_1)

SIDE NOTES:
(a) Solution 2 (custom function setdiff_sorted) returns a list (compared to an array in solution 1).

(b) If you aren't sure if the elements are unique, just use the default setting of NumPy's setdiff1d in both solutions A and B. What can be an example of a complication? See note (c).

(c) Things will be different if either of the two lists is not unique.
Say list_2 is not unique: list2 = ["a", "f", "c", "m", "m"]. Keep list1 as is: list_1 = ["a", "b", "c", "d", "e"]
Setting the default value of assume_unique yields ["f", "m"] (in both solutions). HOWEVER, if you set assume_unique=True, both solutions give ["f", "m", "m"]. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP assume_unique to its default value. Note that both answers are sorted.

JP Maulion
  • 2,454
  • 1
  • 10
  • 13
  • If your lists are already ordered, this will also return an ordered list. The native solution of converting to sets and then getting the difference (solutions shown below) returns an unordered list which may make it harder to visually examine your results. – Doubledown Nov 29 '18 at 00:04
  • 1
    Hi, @Doubledown! Your concern has been addressed in the edited post. Hope this helps! – JP Maulion Nov 19 '19 at 14:56
91

Use a list comprehension like this:

main_list = [item for item in list_2 if item not in list_1]

Output:

>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"] 
>>> 
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']

Edit:

Like mentioned in the comments below, with large lists, the above is not the ideal solution. When that's the case, a better option would be converting list_1 to a set first:

set_1 = set(list_1)  # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]
ettanany
  • 19,038
  • 9
  • 47
  • 63
  • 3
    Note: For larger `list_1`, you'd want to preconvert to a `set`/`frozenset`, e.g. `set_1 = frozenset(list_1)`, then `main_list = [item for item in list_2 if item not in set_1]`, reducing the check time from `O(n)` per item to (roughly) `O(1)`. – ShadowRanger Dec 13 '16 at 16:35
  • @ettanany Please beware if you try the solution as ettanany posted. I tried ettanany's solution as is and it is indeed super slow for a larger list. Can you update the answer to incorporate shadowranger's suggestion? – Doubledown Apr 11 '19 at 18:50
  • 1
    Would it be possible getting the index, instead of the string? – JareBear Apr 09 '20 at 20:11
  • 1
    @JareBear You can use `enumerate()` for that: `[index for (index, item) in enumerate(list_2) if item not in list_1]` – ettanany Apr 10 '20 at 09:40
  • @ettanany's thank you very much!! I'll implement that asap, I had done it. But your code looks so much cleaner. – JareBear Apr 10 '20 at 13:30
87

Not sure why the above explanations are so complicated when you have native methods available:

main_list = list(set(list_2)-set(list_1))
A.Kot
  • 7,615
  • 2
  • 22
  • 24
12

If you want a one-liner solution (ignoring imports) that only requires O(max(n, m)) work for inputs of length n and m, not O(n * m) work, you can do so with the itertools module:

from itertools import filterfalse

main_list = list(filterfalse(set(list_1).__contains__, list_2))

This takes advantage of the functional functions taking a callback function on construction, allowing it to create the callback once and reuse it for every element without needing to store it somewhere (because filterfalse stores it internally); list comprehensions and generator expressions can do this, but it's ugly.†

That gets the same results in a single line as:

main_list = [x for x in list_2 if x not in list_1]

with the speed of:

set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]

Of course, if the comparisons are intended to be positional, so:

list_1 = [1, 2, 3]
list_2 = [2, 3, 4]

should produce:

main_list = [2, 3, 4]

(because no value in list_2 has a match at the same index in list_1), you should definitely go with Patrick's answer, which involves no temporary lists or sets (even with sets being roughly O(1), they have a higher "constant" factor per check than simple equality checks) and involves O(min(n, m)) work, less than any other answer, and if your problem is position sensitive, is the only correct solution when matching elements appear at mismatched offsets.

†: The way to do the same thing with a list comprehension as a one-liner would be to abuse nested looping to create and cache value(s) in the "outermost" loop, e.g.:

main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]

which also gives a minor performance benefit on Python 3 (because now set_1 is locally scoped in the comprehension code, rather than looked up from nested scope for each check; on Python 2 that doesn't matter, because Python 2 doesn't use closures for list comprehensions; they operate in the same scope they're used in).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
5
main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]

for i in list_2:
    if i not in list_1:
        main_list.append(i)

print(main_list)

output:

['f', 'm']
Taufiq Rahman
  • 5,600
  • 2
  • 36
  • 44
  • Like [the equivalent list comprehension based solution](http://stackoverflow.com/a/41125957/364696), this will be slow if `list_1` is large, and `list_2` is of non-trivial size, because it involves `len(list_2)` `O(n)` scans of `list_1`, making it `O(n * m)` (where `n` and `m` are the lengths of `list_2` and `list_1` respectively). If you convert `list_1` to a `set`/`frozenset` up front, the contains checks can be done in `O(1)`, making the total work `O(n)` on the length of `list_2` (technically, `O(max(n, m))`, since you do `O(m)` work to make the `set`). – ShadowRanger Dec 13 '16 at 16:57
4

If the number of occurences should be taken into account you probably need to use something like collections.Counter:

list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 
from collections import Counter
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1[key] != counts]

>>> final
['f', 'm']

As promised this can also handle differing number of occurences as "difference":

list_1=["a", "b", "c", "d", "e", 'a']
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1[key] != counts]

>>> final
['a', 'f', 'm']
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
MSeifert
  • 145,886
  • 38
  • 333
  • 352
4

I used two methods and I found one method useful over other. Here is my answer:

My input data:

crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']

Method1: np.setdiff1d I like this approach over other because it preserves the position

test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']

Method2: Though it gives same answer as in Method1 but disturbs the order

test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']

Method1 np.setdiff1d meets my requirements perfectly. This answer for information.

Msquare
  • 775
  • 7
  • 17
3

I would zip the lists together to compare them element by element.

main_list = [b for a, b in zip(list1, list2) if a!= b]
Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
  • If the OP wants to compare element by element (it's unclear, the example could go either way), this is _much_ more efficient than the other answers, since it's a single cheap pass over both `list`s with a single new `list` being constructed, no additional temporaries, no expensive containment checks, etc. – ShadowRanger Dec 13 '16 at 16:53
  • 1
    @ShadowRanger this would only work for element-wise difference which is a key point – ford prefect Sep 05 '17 at 14:37
  • @fordprefect: Yup. [My own answer](https://stackoverflow.com/a/41126821/364696) covers position-independent differences. – ShadowRanger Sep 06 '17 at 00:39
0

From ser1 remove items present in ser2.

Input

ser1 = pd.Series([1, 2, 3, 4, 5]) ser2 = pd.Series([4, 5, 6, 7, 8])

Solution

ser1[~ser1.isin(ser2)]

adnan
  • 19
  • 2
  • Welcome to Stack Overflow. This question has eight other responses, one of which has been accepted by the original poster. Please describe how your answer improves upon what's already been presented. – chb May 02 '19 at 07:17