15
x = [8,2,3,4,5]
y = [6,3,7,2,1]

How to find out the first common element in two lists (in this case, "2") in a concise and elegant way? Any list can be empty or there can be no common elements - in this case None is fine.

I need this to show python to someone who is new to it, so the simpler the better.

UPD: the order is not important for my purposes, but let's assume I'm looking for the first element in x that also occurs in y.

Makoto
  • 104,088
  • 27
  • 192
  • 230
georg
  • 211,518
  • 52
  • 313
  • 390
  • 2
    "2" is not the first common - "1" is – volcano Apr 20 '13 at 09:22
  • @thg435 You got 23.4k reputation, 666 answers, most of them related to python and some of them have almost 100 upvotes. What's going on? O.o –  Apr 20 '13 at 09:23
  • @MarkusMeskanen: being a programmer and teaching programming are different things. Of course, I do know how to get "2" from these lists, but I have hard time solving that without involving advanced concepts like sets, iterators or nested comprehensions. That's why I'm asking for help. – georg Apr 20 '13 at 09:30
  • 3
    Really, is `teaching` a tag now? – Martijn Pieters Apr 20 '13 at 09:41
  • 1
    @volcano why not? `x,y` are perfectly valid mathematical expressions, the same for `i,j,k` for iterators and iterators-like variables. Also when working with geometry `a,b,c,d` are quite common names for attributes in parametric equation. – Vyktor Apr 20 '13 at 09:46
  • 3
    @Vyktor Don't forget to mention `k,v` for iterations and `z` with `x,y`. Also, `n` often represents any natural number. –  Apr 20 '13 at 09:49
  • 4
    I'm not sure that the question is very well defined. Is the "first common element" always determined by the place in the `x` list? Could the answer be `3`, since it appears earlier in the `y` list (or even because the sum of its indecies in the two list is the smallest)? – Blckknght Apr 20 '13 at 09:52
  • 1
    Your question is unclear - what should `[1, 2, 3]` and `[3, 2, 1]` give? – Eric Apr 20 '13 at 09:54
  • thg435: would you mind showing us (or at least me) how you would done this? I'm curious about *23.4k-programmer*s non-educational answer ;) – Vyktor Apr 20 '13 at 10:00
  • @thg435: I am not sure I got the question correctly. Why is it only `2`, why not `3`? As I can see, `3` is at position `2` in list `y` and position `3` in list `x`. Does the order of traversing the list matters? – Abhijit Apr 20 '13 at 10:05
  • @Abhijit: I've added an explanation. – georg Apr 20 '13 at 10:08
  • 1
    @Vyktor: my answer would be "it depends". How big are lists (ten elements or ten millions), what items (ints, floats, strings) etc. There's no solution that would be equally good for all purposes. – georg Apr 20 '13 at 10:33
  • @Vyctor and Markus (if the latter was not sarcastic :-) ) - one-letter variables names have a tendency to get lost in the code and do not add to code understanding. "perfectly valid mathematical expressions" create unsupportable code. – volcano Apr 20 '13 at 11:06

10 Answers10

10

This should be straight forward and almost as effective as it gets (for more effective solution check Ashwini Chaudharys answer and for the most effective check jamylaks answer and comments):

result = None
# Go trough one array
for i in x:

    # The element repeats in the other list...
    if i in y:

        # Store the result and break the loop
        result = i
        break

Or event more elegant would be to encapsulate the same functionality to functionusing PEP 8 like coding style conventions:

def get_first_common_element(x,y):
    ''' Fetches first element from x that is common for both lists
        or return None if no such an element is found.
    '''
    for i in x:
        if i in y:
            return i

    # In case no common element found, you could trigger Exception
    # Or if no common element is _valid_ and common state of your application
    # you could simply return None and test return value
    # raise Exception('No common element found')
    return None

And if you want all common elements you can do it simply like this:

>>> [i for i in x if i in y]
[1, 2, 3]
Community
  • 1
  • 1
Vyktor
  • 20,559
  • 6
  • 64
  • 96
  • "Any list can be empty or there can be no common elements - in this case None is fine." I think you shouldn't raise an exception, just let the function return `None`. Also your indentation is wrong on the last 3 lines. –  Apr 20 '13 at 09:25
  • @MarkusMeskanen I've enumerated multiple options how to do this... I prefer raising exceptions to returning `None` (when you're writing large script that should process tens of thousands files a day it's easy to forget to check some error states... **for me** it's better to crash your script so you'll get full notification and handle it)... Of course this isn't ultimate rule. If he wants user to input correlated lists, it's in fact exception when no common element is found. And if it's common and **valid** state of application, return `None`. – Vyktor Apr 20 '13 at 09:32
  • Yeah I know, I would raise an exception too. However, OP is specifically asking for `return None`, so I just felt like mentioning :) –  Apr 20 '13 at 09:35
  • No offense, but you still have the extra space: http://oi36.tinypic.com/5dlbti.jpg Apologies for my extreme paint skills, you might get jelly. –  Apr 20 '13 at 09:45
  • @Vyktor Binary search is not the most effective solution... The fastest way to do this is O(N) with a set/hash map – jamylak Apr 20 '13 at 10:09
  • @jamylak I agree about hash map, but if you have input in list I believe that conversion to hash/set is slower than using BS. If you can provide documentation/example/proof that it's another way around I'll be glad to change my answer (I'm genuinely looking for the best solution, not just trying to defend my point). – Vyktor Apr 20 '13 at 10:16
  • @Vyktor you are wrong... To convert `y` into a set is `O(N)`, compared to Python timsort, which is `O(N log N)`, to check membership in a set is `O(1)` (constant amortized time), To iterate through every item in `x` and check it's membership is thus `O(N)` since the checks can be done in constant time. – jamylak Apr 20 '13 at 10:20
  • 1
    @jamylak thanks you've convinced me and I've modified answer, but I don't believe that checking membership existence is `O(1)` (unless you've used some sort of index), I believe it's possible just in `O(log(N))` – Vyktor Apr 20 '13 at 10:26
  • Thanks, I guess nested loops are easiest to explain. As you probably guess, efficiency doesn't matter much in my case. – georg Apr 20 '13 at 10:35
  • There's no point in returning `None`; every function that doesn't explicitly return a value, returns `None`. – Thijs van Dien Apr 20 '13 at 10:42
  • @ThijsvanDien [your argument is invalid](http://stackoverflow.com/a/4027624/1149736) (it's supposed to be quote, not to be offensive...) I consider explicit return to be good practice for code readability, therefor I put it there :) – Vyktor Apr 20 '13 at 10:53
  • 2
    @Vyktor Can you please just read what a hash map is and stop arguing a completely wrong point... – jamylak Apr 20 '13 at 11:00
  • @jamylak I'm clearly mistaken what hashmap is but I have troubles finding python related hashmap queries except link to python's dict... Could you provide any reading? – Vyktor Apr 20 '13 at 12:35
  • 1
    @Vyktor Just search the web, wikipedia, whatever, you don't have to be an expert just understand the concept if you want. btw I also posted timings in my answer for anyone else who doesn't believe me! – jamylak Apr 20 '13 at 12:59
  • @jamylak Yeah, I saw article on [wikipedia](http://en.wikipedia.org/wiki/Hashmap) and I'm **just curious** about the particular implementation you had in mind, because as I understand it you have to allocate space for each element of hash table (say hash is 16b, then it's `*(2**16)`) which seems be waste of space in cases like this or then use smaller hash dimension which would result in many collisions (higher complexity of *soft search*). And it seemed to me that you're proposing time-memory trade-off with 100% importance placed on time. – Vyktor Apr 20 '13 at 13:11
  • @Vyktor Considering that Python is built with an underlying implementation of dictionaries used in almost everything, we can assume that it's dictionaries are space efficient... They will increase in size when necessary, it starts out quadrupling and then doubles after reaching a really large size (50K) but it starts out small so you are unlikely to waste a lot of memory. – jamylak Apr 20 '13 at 13:27
10

A sort is not the fastest way of doing this, this gets it done in O(N) time with a set (hash map).

>>> x = [8,2,3,4,5]
>>> y = [6,3,7,2,1]
>>> set_y = set(y)
>>> next((a for a in x if a in set_y), None)
2

Or:

next(ifilter(set(y).__contains__, x), None)

This is what it does:

>>> def foo(x, y):
        seen = set(y)
        for item in x:
            if item in seen:
                return item
        else:
            return None


>>> foo(x, y)
2

To show the time differences between the different methods (naive approach, binary search an sets), here are some timings. I had to do this to disprove the suprising number of people that believed binary search was faster...:

from itertools import ifilter
from bisect import bisect_left

a = [1, 2, 3, 9, 1, 1] * 100000
b = [44, 11, 23, 9, 10, 99] * 10000

c = [1, 7, 2, 4, 1, 9, 9, 2] * 1000000 # repeats early
d = [7, 6, 11, 13, 19, 10, 19] * 1000000

e = range(50000) 
f = range(40000, 90000) # repeats in the middle

g = [1] * 10000000 # no repeats at all
h = [2] * 10000000

from random import randrange
i = [randrange(10000000) for _ in xrange(5000000)] # some randoms
j = [randrange(10000000) for _ in xrange(5000000)]

def common_set(x, y, ifilter=ifilter, set=set, next=next):
    return next(ifilter(set(y).__contains__, x), None)
    pass

def common_b_sort(x, y, bisect=bisect_left, sorted=sorted, min=min, len=len):
    sorted_y = sorted(y)
    for a in x:
        if a == sorted_y[min(bisect_left(sorted_y, a),len(sorted_y)-1)]:
            return a
    else:
        return None

def common_naive(x, y):
    for a in x:
        for b in y:
            if a == b: return a
    else:
        return None

from timeit import timeit
from itertools import repeat
import threading, thread

print 'running tests - time limit of 20 seconds'

for x, y in [('a', 'b'), ('c', 'd'), ('e', 'f'), ('g', 'h'), ('i', 'j')]:
    for func in ('common_set', 'common_b_sort', 'common_naive'):        
        try:
            timer = threading.Timer(20, thread.interrupt_main)   # 20 second time limit
            timer.start()
            res = timeit(stmt="print '[', {0}({1}, {2}), ".format(func, x, y),
                         setup='from __main__ import common_set, common_b_sort, common_naive, {0}, {1}'.format(x, y),
                         number=1)
        except:
            res = "Too long!!"
        finally:
            print '] Function: {0}, {1}, {2}. Time: {3}'.format(func, x, y, res)
            timer.cancel()

The test data was:

a = [1, 2, 3, 9, 1, 1] * 100000
b = [44, 11, 23, 9, 10, 99] * 10000

c = [1, 7, 2, 4, 1, 9, 9, 2] * 1000000 # repeats early
d = [7, 6, 11, 13, 19, 10, 19] * 1000000

e = range(50000) 
f = range(40000, 90000) # repeats in the middle

g = [1] * 10000000 # no repeats at all
h = [2] * 10000000

from random import randrange
i = [randrange(10000000) for _ in xrange(5000000)] # some randoms
j = [randrange(10000000) for _ in xrange(5000000)]

Results:

running tests - time limit of 20 seconds
[ 9 ] Function: common_set, a, b. Time: 0.00569520707241
[ 9 ] Function: common_b_sort, a, b. Time: 0.0182240340602
[ 9 ] Function: common_naive, a, b. Time: 0.00978832505249
[ 7 ] Function: common_set, c, d. Time: 0.249175872911
[ 7 ] Function: common_b_sort, c, d. Time: 1.86735751332
[ 7 ] Function: common_naive, c, d. Time: 0.264309220865
[ 40000 ] Function: common_set, e, f. Time: 0.00966861710078
[ 40000 ] Function: common_b_sort, e, f. Time: 0.0505980508696
[ ] Function: common_naive, e, f. Time: Too long!!
[ None ] Function: common_set, g, h. Time: 1.11300018578
[ None ] Function: common_b_sort, g, h. Time: 14.9472068377
[ ] Function: common_naive, g, h. Time: Too long!!
[ 5411743 ] Function: common_set, i, j. Time: 1.88894859542
[ 5411743 ] Function: common_b_sort, i, j. Time: 6.28617268396
[ 5411743 ] Function: common_naive, i, j. Time: 1.11231867458

This gives you an idea of how it will scale for larger inputs, O(N) vs O(N log N) vs O(N^2)

jamylak
  • 128,818
  • 30
  • 231
  • 230
  • IMO, this reads much better [as a generator expression](http://stackoverflow.com/a/16119274/102441) – Eric Apr 20 '13 at 10:28
  • Well, python is not the best language to learn functional programming. – georg Apr 20 '13 at 10:44
  • @Eric True, this was mostly for fun though... If I was really gonna do it I would just use a generator expression as you said. Update: I added the generator one now. – jamylak Apr 20 '13 at 10:46
  • +1. does the O(N log N) cost for binary search-based solution above account for the expense of sorting involved in getting to sorted_y? – iruvar Apr 20 '13 at 14:07
  • Yes, that's what I meant since the sorting sets the lower bound. – jamylak Apr 20 '13 at 14:15
6

One liner, using next to take the first item from a generator:

x = [8,2,3,4,5]
y = [6,3,7,2,1]

first = next((a for a in x if a in y), None)

Or more efficiently since set.__contains__ is faster than list.__contains__:

set_y = set(y)
first = next((a for a in x if a in set_y), None)

Or more efficiently but still in one line (don't do this):

first = next((lambda set_y: a for a in x if a in set_y)(set(y)), None)
Eric
  • 95,302
  • 53
  • 242
  • 374
3

Using a for loops with in will result in a O(N^2) complexity, but you can sort y here and use binary search to improve the time complexity to O(NlogN).

def binary_search(lis,num):
    low=0
    high=len(lis)-1
    ret=-1  #return -1 if item is not found
    while low<=high:
        mid=(low+high)//2
        if num<lis[mid]:
            high=mid-1
        elif num>lis[mid]:
            low=mid+1
        else:
            ret=mid
            break

    return ret

x = [8,2,3,4,5]
y = [6,3,7,2,1]
y.sort()

for z in x:
    ind=binary_search(y,z)
    if ind!=-1
        print z
        break

output: 2

Using the bisect module to perform the same thing as above:

import bisect

x = [8,2,3,4,5]
y = [6,3,7,2,1]
y.sort()

for z in x:
    ind=bisect.bisect(y,z)-1  #or use `ind=min(bisect.bisect_left(y, z), len(y) - 1)`
    if ind!=-1 and y[ind] ==z:
        print z      #prints 2
        break     
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • However not very newbie friendly ;) – tyteen4a03 Apr 20 '13 at 09:37
  • Haha,+1, love your answer... (And removed note on effectiveness from mine)... But I'm afraid it's not newbie friendly... But still definitely worth mentioning for *academical purposes*. ;) – Vyktor Apr 20 '13 at 09:39
  • What is `loops`? what if `-1` is in the list? – GP89 Apr 20 '13 at 09:41
  • +1 for Binary Search. There's a huge difference between `O(N^2)` and `O(NlogN)`. – Thanakron Tandavas Apr 20 '13 at 09:42
  • 1
    @GP89 binary search returns index and `for z in x: if z in y` is actually equivalent to two `for` loops. – Ashwini Chaudhary Apr 20 '13 at 09:43
  • I meant the `loops` variable, `loops+=1`. And yea didn't notice it was returning the index, read it too fast :P – GP89 Apr 20 '13 at 09:49
  • @GP89 Thanks!, forgot to remove that. I was using it to count the number of loops performed to find an item. https://gist.github.com/monty-singh/5135029 – Ashwini Chaudhary Apr 20 '13 at 09:53
  • btw: Would you mind adding link where is this algorithm explained (or commenting it little better; your choice)? I'm curious about little theory around it not, not just *design pattern: type-writer" (copy-paste). – Vyktor Apr 20 '13 at 09:58
  • 1
    @Vyktor BS is very famous, you can find tons of tutorials on it. [Youtube - What is binary search](http://www.youtube.com/watch?v=j5uXyPJ0Pew), [WikiPedia - Binary search Algo](http://en.wikipedia.org/wiki/Binary_search_algorithm) – Ashwini Chaudhary Apr 20 '13 at 10:10
  • Thanks, binary search is definitely on my list. – georg Apr 20 '13 at 10:36
  • @AshwiniChaudhary Do you know how many times that worst case for sets will occur during your lifetime? Probably none! :P But seriously, that worse case assumes a hash collision on every item, which is very very very unlikely to happen! – jamylak Apr 20 '13 at 10:52
  • @AshwiniChaudhary Your comment is highly misleading, I suggest you read this: http://stackoverflow.com/a/15192095/1219006. In this case we **know** we aren't gonna have a hash collision with Python's integers! So it's O(1) for every check! – jamylak Apr 20 '13 at 10:58
  • @jamylak I agree with that, but when considering the worst case we should assume the worst possibilities that can possibly occur. – Ashwini Chaudhary Apr 20 '13 at 10:59
  • @AshwiniChaudhary Ok then throw quicksort out the window... The fact of the matter is... just say we get two lists of 1,000,000 items each. My code might be over 1000 times faster than yours (maybe exaggerating but you get the point). – jamylak Apr 20 '13 at 11:01
  • @jamylak I'm convinced(thanks for the link) ;), I've removed the set complexity part. – Ashwini Chaudhary Apr 20 '13 at 11:10
  • 1
    @AshwiniChaudhary Thank you :D Was just gonna run timings, btw you do know there is a `bisect` module, it is very much neglected though it does what you want in one line – jamylak Apr 20 '13 at 11:12
  • @jamylak Yes, but I've only used `bisect.insort` once. BTW `bisect.bisect` seems perfect here. Thanks again.:) – Ashwini Chaudhary Apr 20 '13 at 11:20
  • @AshwiniChaudhary That doesn't work though. This should do it `y[bisect.bisect_left(y, find)] == find` – jamylak Apr 20 '13 at 11:27
  • @jamylak `y[bisect.bisect(y,find)-1]==find` works fine while `y[bisect.bisect_left(y, find)] ` and `y[bisect.bisect_left(y, find)-1]` doesn't work for `x = [15,3,2,3,4,5]`. – Ashwini Chaudhary Apr 20 '13 at 12:01
  • @AshwiniChaudhary actually that's flawed too because you can get an index of `-1`. use something like `min(bisect.bisect_left(y, find), len(y) - 1)` – jamylak Apr 20 '13 at 13:08
3

I assume you want to teach this person Python, not just programming. Therefore I do not hesitate to use zip instead of ugly loop variables; it's a very useful part of Python and not hard to explain.

def first_common(x, y):
    common = set(x) & set(y)
    for current_x, current_y in zip(x, y):
        if current_x in common:
            return current_x
        elif current_y in common:
            return current_y

print first_common([8,2,3,4,5], [6,3,7,2,1])

If you really don't want to use zip, here's how to do it without:

def first_common2(x, y):
    common = set(x) & set(y)
    for i in xrange(min(len(x), len(y))):
        if x[i] in common:
            return x[i]
        elif y[i] in common:
            return y[i]

And for those interested, this is how it extends to any number of sequences:

def first_common3(*seqs):
    common = set.intersection(*[set(seq) for seq in seqs])
    for current_elements in zip(*seqs):
        for element in current_elements:
            if element in common:
                return element

Finally, please note that, in contrast to some other solutions, this works as well if the first common element appears first in the second list.

I just noticed your update, which makes for an even simpler solution:

def first_common4(x, y):
    ys = set(y) # We don't want this to be recreated for each element in x
    for element in x:
        if element in ys:
            return element

The above is arguably more readable than the generator expression.

Too bad there is no built-in ordered set. It would have made for a more elegant solution.

Thijs van Dien
  • 6,516
  • 1
  • 29
  • 48
  • 2
    `izip_longest()` could be replaced by the more common `zip()`: there is no need to keep searching in the longest sequence when the shortest one is exhausted, because any common element would have already been found in the shortest list. – Eric O. Lebigot Apr 20 '13 at 12:13
  • This is probably the best answer still. Thanks! – blakev Jul 21 '18 at 03:42
1

Using for loops seems easiest to explain to someone new.

for number1 in x:
    for number2 in y:
        if number1 == number2:
            print number1, number2
            print x.index(number1), y.index(number2)
            exit(0)
print "No common numbers found."

NB Not tested, just out of my head.

Pythonidae
  • 106
  • 4
  • 1
    Actually, the second loop is not necessary - see the top answer. – georg Apr 20 '13 at 10:44
  • For functionality it's not, I thought it would help show how the program worked whilst teaching. But yes, I agree that this would be too long if used in an efficient program. – Pythonidae Apr 23 '13 at 16:58
1

This one uses sets. It returns the first common element or None if no common element.

def findcommon(x,y):
    common = None
    for i in range(0,max(len(x),len(y))):
        common = set(x[0:i]).intersection(set(y[0:i]))
        if common: break
    return list(common)[0] if common else None
jujaro
  • 447
  • 2
  • 4
1
def first_common_element(x,y):
    common = set(x).intersection(set(y))
    if common:
        return x[min([x.index(i)for i in common])]
JJJW
  • 41
  • 4
1

Just for fun (probably not efficient), another version using itertools:

from itertools import dropwhile, product
from operator import __ne__

def accept_pair(f):
    "Make a version of f that takes a pair instead of 2 arguments."
    def accepting_pair(pair):
        return f(*pair)
    return accepting_pair

def get_first_common(x, y):
    try:
        # I think this *_ unpacking syntax works only in Python 3
        ((first_common, _), *_) = dropwhile(
            accept_pair(__ne__),
            product(x, y))
    except ValueError:
        return None
    return first_common

x = [8, 2, 3, 4, 5]
y = [6, 3, 7, 2, 1]
print(get_first_common(x, y))  # 2
y = [6, 7, 1]
print(get_first_common(x, y))  # None

It is simpler, but not as fun, to use lambda pair: pair[0] != pair[1] instead of accept_pair(__ne__).

bli
  • 7,549
  • 7
  • 48
  • 94
0

Use set - this is the generic solution for arbitrary number of lists:

def first_common(*lsts):
    common = reduce(lambda c, l: c & set(l), lsts[1:], set(lsts[0]))
    if not common:
        return None
    firsts = [min(lst.index(el) for el in common) for lst in lsts]
    index_in_list = min(firsts)
    trgt_lst_index = firsts.index(index_in_list)
    return lsts[trgt_lst_index][index_in_list]

An afterthought - not an effective solution, this one reduces redundant overhead

def first_common(*lsts):
    common = reduce(lambda c, l: c & set(l), lsts[1:], set(lsts[0]))
    if not common:
        return None
    for lsts_slice in itertools.izip_longest(*lsts):
        slice_intersection = common.intersection(lsts_slice)
        if slice_intersection:
            return slice_intersection.pop()
volcano
  • 3,578
  • 21
  • 28