Intersection of Two Lists Of Strings

Question

I had an interview question along these lines:

Given two lists of unordered customers, return a list of the intersection of the two lists. That is, return a list of the customers that appear in both lists.

Some things I established:

Assume each customer has a unique name
If the name is the same in both lists, it's the same customer
The names are of the form first name last name
There's no trickery of II's, Jr's, weird characters, etc.

I think the point was to find an efficient algorithm/use of data structures to do this as efficiently as possible.

My progress went like this:

Read one list in to memory, then read the other list one item at a time to see if there is a match
Alphabetize both lists then start at the top of one list and see if each item appears in the other list
Put both lists into ordered lists, then use the shorter list to check item by item (that way, it one list has 2 items, you only check those 2 items)
Put one list into a hash, and check for the existence of keys from the other list

The interviewer kept asking, "What next?", so I assume I'm missing something else.

Any other tricks to do this efficiently?

Side note, this question was in python, and I just read about sets, which seem to do this as efficiently as possible. Any idea what the data structure/algorithm of sets is?

Hashing will be much helpful in your problem that may provide solution in O(n). — Akashdeep Saluja, Oct 07 '12 at 09:41

Joran Beasley · Answer 1 · 2012-10-07T02:33:43.540

6

It really doesnt matter how its implemented ... but I believe it is implemented in C so it is faster and better set([1,2,3,4,5,6]).intersection([1,2,5,9]) is likely what they wanted

In python readability counts for alot! and set operations in python are used extensively and well vetted...

that said another pythonic way of doing it would be

list_new = [itm for itm in listA if itm in listB]

or

list_new = filter(lambda itm:itm in listB,listA)

basically I believe they were testing if you were familliar with python, not if you could implement the algorithm. since they asked a question that is so well suited to python

edited Oct 07 '12 at 02:33

answered Oct 07 '12 at 02:23

Joran Beasley

110,522
12
160
179

Do you know what `[itm for itm in listA if itm in listB]` is doing behind the scenes? I didn't realize you could create a list in python like that? I would have thought to `append` to the list in a `for` loop, but this is much cleaner. – Tyler DeWitt Oct 07 '12 at 07:42
Its a list comprehension it is one of the things python is optimized for. It creates a new list from the items in ListA but only if they exist in listB. but once again they were not testing if you could come up with an algorithm they were testing if you were familiar with python constructs – Joran Beasley Oct 07 '12 at 18:03
Thanks. My geekiness was just intrigued by what was going on there. – Tyler DeWitt Oct 08 '12 at 05:28

score 1 · Accepted Answer · answered Oct 07 '12 at 02:19

Put one list into a bloom filter and use that to filter the second list.
Put the filtered second list into a bloom filter and use that to filter the first list.
Sort the two lists and find the intersection by one of the methods above.

The benefit of this approach (besides letting you use a semi-obscure data structure correctly in an interview) is that it doesn't require any O(n) storage until after you have (with high probability) reduced the problem size.

The interviewer kept asking, "What next?", so I assume I'm missing something else.

Maybe they would just keep asking that until you run out of answers.

http://code.google.com/p/python-bloom-filter/ is a python implementation of bloom filters.

Intersection of Two Lists Of Strings

2 Answers2

Linked