Python - most efficient way of matching a list

Question

Say for example I have a list of 7 numbers, let's call it 'lineA' and I have 50 lines of numbers below and I want to see if 'lineA' matches my 50 lines EXACTLY in that order. What is the quickest (time-wise)/most efficient way of doing it? A loop? Or any other method?

lineA = [1,2,3,4,5,6,7]

lineTwo = [1,33,40,44,45,1,2]
lineThree = [2,13,22,41,50,8,9]
lineFour = [1,2,3,4,5,6,7]
lineFive = etc.....(repeat this 50 times)

Thank you

If you have 2 or more variables with numbers in their names, you almost certainly wanted a list of lists, not 50 variables. — Wooble, Jul 25 '13 at 16:00
Output should just be a simple print saying 'lineA matches lineFour' or whatever line it matches. — BubbleMonster, Jul 25 '13 at 16:09
quickest to run.....however, I would like it to be quick to write as well... I don't know if that's possible though? — BubbleMonster, Jul 25 '13 at 16:11
http://stackoverflow.com/questions/364621/python-get-position-in-list That link and the fact that `[1, 2] == [1, 2]` -> `True` answers your question. — RussW, Jul 25 '13 at 16:53

score 2 · Accepted Answer · edited Jul 25 '13 at 16:45

2

You can compare two lists with the == operator:

if lineA == lineOne:
    print 'they match!'

Now, keep all your lines in a list:

lines = [lineOne, lineTwo, lineThree, ..., lineFifty]

And just find the lines that match:

matches = [line for line in lines if line == lineA]

You can't really get more efficient than comparing each line (O(n)). UNLESS you sort your input first. Then you could use the bisect module and get an O(log n) performance. Do this if you want to compare lineB, lineC, lineZ with lines as well. Otherwise don't bother, because sorting will essentially compare all the lines as well for O(n * log n)...

edited Jul 25 '13 at 16:45

bedwyr

5,774
4
31
49

answered Jul 25 '13 at 16:06

Daren Thomas

67,947
40
154
200

1

What would be the point of `matches`? All it reveals is how many lines matched. – arshajii Jul 25 '13 at 16:09
1

Use enumerate to get the index position to store in matches instead of the list that matches. – sage88 May 20 '16 at 20:35

score 2 · Answer 2 · answered Jul 25 '13 at 16:15

2

First, create a list containing all your lists.

lines = [lineOne, lineTwo, ..., lineFifty]

Then you can use the following:

outs = [ind for ind,val in enumerate(lines) if val == lineA]

This comprehension is O(n) I believe since it just checks each value in lines once. outs now tells you where lines contains a list equal to lineA. Plus it's a one-liner which is about as 'fast to write' as you can get. Using the indices you can work out which line was matched.

answered Jul 25 '13 at 16:15

A.Wan

1,818
3
21
34

This is a better answer than Daren Thomas' because it stores the index instead of just the same list over and over. – sage88 May 20 '16 at 20:37
I agree, seems like having the index could be useful - am I the only one who doesn't get the use case of this question at all? – Daren Thomas May 23 '16 at 08:24

Eiyrioü von Kauyf · Answer 3 · 2013-07-25T16:44:32.737

Numpy array_equal should be able to improve runtime over anything else said here; frankly you can't do better than comparing elements lexographically i.e. what the list == operator does. unless this is a streaming list or there is some other aspect you are omitting.

a benchmark on my very slow computer: you should be able to get a larger difference

import timeit

testeq= """\
a = [ range(randint(0,10),100) for x in xrange(500)]
b = range(5,100)
c = [list for list in a if list==b ]
"""

testnpeq= """
a = numpy.array([ range(randint(0,10),100) for x in xrange(500)])
b = numpy.array(range(5,100))
c = numpy.where(array_equal(a,b))
"""


print 'using == operator', timeit.timeit(testeq, setup="from random\
 import randint", number = 10000)

print 'using np operator', timeit.timeit(testeq, setup="from random\
 import randint\nimport numpy as np", number = 10000)

using == operator 13.5805370808

using np operator 12.8217720985

note: both are using different random arrays. You should get even faster run times if you use the same array.

tl;dr only way to do a comparison is lexographically; use numpy / C

Python - most efficient way of matching a list

3 Answers3