-4

Say for example I have a list of 7 numbers, let's call it 'lineA' and I have 50 lines of numbers below and I want to see if 'lineA' matches my 50 lines EXACTLY in that order. What is the quickest (time-wise)/most efficient way of doing it? A loop? Or any other method?

lineA = [1,2,3,4,5,6,7]

lineTwo = [1,33,40,44,45,1,2]
lineThree = [2,13,22,41,50,8,9]
lineFour = [1,2,3,4,5,6,7]
lineFive = etc.....(repeat this 50 times)

Thank you

BubbleMonster
  • 1,366
  • 8
  • 32
  • 48

3 Answers3

2

You can compare two lists with the == operator:

if lineA == lineOne:
    print 'they match!'

Now, keep all your lines in a list:

lines = [lineOne, lineTwo, lineThree, ..., lineFifty]

And just find the lines that match:

matches = [line for line in lines if line == lineA]

You can't really get more efficient than comparing each line (O(n)). UNLESS you sort your input first. Then you could use the bisect module and get an O(log n) performance. Do this if you want to compare lineB, lineC, lineZ with lines as well. Otherwise don't bother, because sorting will essentially compare all the lines as well for O(n * log n)...

bedwyr
  • 5,774
  • 4
  • 31
  • 49
Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
2

First, create a list containing all your lists.

lines = [lineOne, lineTwo, ..., lineFifty]

Then you can use the following:

outs = [ind for ind,val in enumerate(lines) if val == lineA]

This comprehension is O(n) I believe since it just checks each value in lines once. outs now tells you where lines contains a list equal to lineA. Plus it's a one-liner which is about as 'fast to write' as you can get. Using the indices you can work out which line was matched.

A.Wan
  • 1,818
  • 3
  • 21
  • 34
  • This is a better answer than Daren Thomas' because it stores the index instead of just the same list over and over. – sage88 May 20 '16 at 20:37
  • I agree, seems like having the index could be useful - am I the only one who doesn't get the use case of this question at all? – Daren Thomas May 23 '16 at 08:24
0

Numpy array_equal should be able to improve runtime over anything else said here; frankly you can't do better than comparing elements lexographically i.e. what the list == operator does. unless this is a streaming list or there is some other aspect you are omitting.

a benchmark on my very slow computer: you should be able to get a larger difference

import timeit

testeq= """\
a = [ range(randint(0,10),100) for x in xrange(500)]
b = range(5,100)
c = [list for list in a if list==b ]
"""

testnpeq= """
a = numpy.array([ range(randint(0,10),100) for x in xrange(500)])
b = numpy.array(range(5,100))
c = numpy.where(array_equal(a,b))
"""


print 'using == operator', timeit.timeit(testeq, setup="from random\
 import randint", number = 10000)

print 'using np operator', timeit.timeit(testeq, setup="from random\
 import randint\nimport numpy as np", number = 10000)

using == operator 13.5805370808

using np operator 12.8217720985

note: both are using different random arrays. You should get even faster run times if you use the same array.

tl;dr only way to do a comparison is lexographically; use numpy / C

Eiyrioü von Kauyf
  • 4,481
  • 7
  • 32
  • 41