Fastest way of comparing lists of strings

Question

A = ['a','b','c'] 
B = ['d','b','e']

res = [i for i in A if i in B]

The above code does not work when the no of elements in A are 300000 and in B are 200000.

How can I solve it?

I also tried

res = {i for i in A if i in B}
res = list(res)

But still could not get the result.

Change `B` to a set first: `B = set(B)`, then use the list comprehension. — Ashwini Chaudhary, Jul 17 '14 at 10:04
What about `A = set(A)` and `B = set(B)`, then do the intersection `res = A & B`? — Maciej Gol, Jul 17 '14 at 10:04
And what type of comparation you want to do? When comparing lists I assume list are equal if elements are in the same order, too. — sharcashmo, Jul 17 '14 at 10:04

Padraic Cunningham · Answer 1 · 2014-07-17T10:21:15.610

3

A = ['a','b','c']
B = ['d','b','e']

set(A).intersection(B)

To get a list returned:

list(set(A).intersection(B))

intersection takes any iterable as an argument so you just need to make A a set.

Note, the non-operator versions of union(), intersection(), difference(), and symmetric_difference() will accept any iterable as an argument.

edited Jul 17 '14 at 10:21

answered Jul 17 '14 at 10:06

Padraic Cunningham

176,452
29
245
321

Should that be `(set(B))`? – Tim Jul 17 '14 at 10:07
@TimCastelijns, why would B need be a set? – Padraic Cunningham Jul 17 '14 at 10:07
1

This will remove the redundant items from `A` too and also affect the order if it matters, OP's first example is not doing that. – Ashwini Chaudhary Jul 17 '14 at 10:10
1

Not having B a `set` increases the time by 50% on my computer on a test case of the size OP specified – deinonychusaur Jul 17 '14 at 10:10
@deinonychusaur well if the OP has lists how are they going to magically become sets? – Padraic Cunningham Jul 17 '14 at 10:11
I was just asking for myself :-) didn't mean to imply this was wrong – Tim Jul 17 '14 at 10:11
`set(A).intersection(set(B))` if there are redundencies in `B` I assume is the difference in performance – deinonychusaur Jul 17 '14 at 10:12
@TimCastelijns, no worries I thought I might have been missing something – Padraic Cunningham Jul 17 '14 at 10:12
@undefinedisnotafunction, the OP was using a set in the question so I don't think the order or redundant items is an issue. – Padraic Cunningham Jul 17 '14 at 10:20
@deinonychusaur, there was exactly `.2` of a `ms` difference on a list with 1000 items using `set(A).intersection(set(B))` and `set(A).intersection(B)` – Padraic Cunningham Jul 17 '14 at 10:26
Only the left-hand side needs to be a set, as shown in the answer here. The intersection operation will proceed by iterating once over the right-hand-side list, grabbing the intersecting elements (by testing if they are `in` the left-hand-side set). This is more efficient because creating a set first would require doing a similar iteration anyway, and then the native set operation. – Karl Knechtel Jul 17 '14 at 10:26
@PadraicCunningham I was testing with the sizes in specified in the question, but having the second as a `set` is, as I said, only relevant if there are many redundancies in `B`, if there are it will be a significant improvement – deinonychusaur Jul 17 '14 at 10:28

score 3 · Accepted Answer · answered Jul 17 '14 at 10:25

3

If preserving order and/or duplicates doesn't matter, then you can use

A = ['a', 'b', 'c']
B = ['d', 'e', 'f']
res = list(set(A) & set(B))

If order and/or duplicates does matter, then you can use

A = ['a', 'b', 'c']
B = ['d', 'e', 'f']
set_b = set(B)
res = [i for i in A if i in set_b]

answered Jul 17 '14 at 10:25

Khaelex

742
5
15

The second one can also be written (although somewhat ugily) as `filter(set(B).__contains_, A)` – Jon Clements Jul 17 '14 at 10:31

score 1 · Answer 3 · answered Jul 17 '14 at 10:05

1

You are basically computing the intersection of two sets. Using the set data type for this will make this efficient:

A = {'a','b','c'}
B = {'d','b','e'}
res = A & B

answered Jul 17 '14 at 10:05

Sven Marnach

574,206
118
941
841

how to put the lists A and B into {}? – bas Jul 17 '14 at 10:11
1

@bas just do `set(A) & set(B)` – deinonychusaur Jul 17 '14 at 10:20

Fastest way of comparing lists of strings

3 Answers3