2

If I have a function to, for example, check if list1 is a sublist of list2, which option is better:

Option 1:

def isSublist1(list1,list2):
  "This fuction checks if list1 is a sublist of list2."
  for i in range(len(list2)):
    part = list2[i:]  # part is a list with all the elements from i to the end of list2
    if len(part)<len(list1):
      return False
    if list1==part[:len(list1)]:  # if list1 is in the beginning of part 
      return True
  return False

Or option 2:

def isSublist2(list1,list2):
  "This fuction checks if list1 is a sublist of list."
  for i in range(len(list2)):
    if len(list2[i:])<len(list1):
      return False
    if list1==list2[i:][:len(list1)]:  # if list1 is in the beginning of list2[i:] (part)
      return True
  return False

In option 1, I use a variable called part to store a section of list2 however, in option 2, part is not a variable, the section of list2 is calculated when it is needed. Is option 1 faster? Does it spend more space?

My problem is not with this function specifically, I know there are other ways to implement this function.

I would like to know which one is the best practice in a loop: using a variable to avoid calculating several times the same things or not. Does the answer depend on the complexity and frequency of the calculation?

t3m2
  • 366
  • 1
  • 15
  • 1
    thats a good thing to consider. The answer generally depends on your use case. A rule of thumb though, time complexity is generally a more pressing problem than space complexity in today's era, which favours storing intermediate values over recalculation. However, exceptions are always present. – Paritosh Singh Apr 05 '19 at 19:55
  • 1
    It's usually a good idea, but it depends on how expensive it is to recalculate, and how verbose the expression you're replacing is. Giving a meaningful name to a complex subexpression makes it easier to understand. – Barmar Apr 05 '19 at 19:56
  • 2
    Also, how many times you refer to the expression in the loop. In your example, if `list2` is not likely to be very long, it may not be worth it, since you only use `part` twice, and `list2[i]` is not much longer than `part`. – Barmar Apr 05 '19 at 19:58
  • 5
    You should generally consider readability before worrying about performance. [Premature optimization is the root of all evil](http://c2.com/cgi/wiki?PrematureOptimization) – Barmar Apr 05 '19 at 19:58
  • Btw comments should look `'''like this'''` or `"""like this"""`. What you have is `"not a comment"` but a `str` (string). – Stefan Falk Apr 05 '19 at 20:12
  • 1
    @displayname It doesn't matter which quotes you use; it's a doc string, not a comment. Comments start with `#`. – chepner Apr 05 '19 at 20:12
  • @chepner wow.. i actually did not know that.. although it's recommended to use tripple double-quotes. – Stefan Falk Apr 05 '19 at 20:15
  • @displayname [Question about how to comment functions](https://stackoverflow.com/questions/2357230/what-is-the-proper-way-to-comment-functions-in-python) – t3m2 Apr 05 '19 at 20:18
  • 2
    Never let Donald Knuth's pithy quote from 1974 deter you from asking questions like this. The premature optimization he was referring to 45 years ago is not what you're doing. His much misused paper is saying (accurately) that you shouldn't micro-optimize code unless you need to. You're not doing that. You're looking to see if there are programming strategies that will let you write more efficient code. Unrolling your two for loops to try to get an extra 10% improvement is premature optimization. What you asked turns out to be a strategy to half your computation time, for free. – Scott Mermelstein Apr 05 '19 at 20:36

2 Answers2

2

Storing local is better because the lookup that python does is faster. It even pays off to store functions locally.

Performance questions are best answered by timing them - you can measure this using timeit:

import timeit

def noTempFunc():
    for _ in range(200):
        max([1,4,5,6])

def tempFunc():
    m = max
    for _ in range(200):
        m([1,4,5,6])


print(timeit.timeit(noTempFunc, number=1000))   #  0.055301458000030834
print(timeit.timeit(tempFunc, number=1000))     #  0.049811941999905684 : 11% faster

In this case the max() of the global context only has to be looked up once, and further lookups are done locally which is ~ 11% faster based on these numbers.

It pays up if you use your local m() multiple times.


In your case it would be sensible to cache the len_list1 = len(list1) - as it is used lots of time and does not change.

To make it more readable, you can consider:

def isSublist(list1, list2):
    """Checks if list2 is a sublist of list1"""
    len_part = len(list2)  # reused inside the list comp, only "calulated" once
    return any( x == list2 for x in (list1[i:i+len_part] 
                                     for i in range(len(list1)-len_part+1) ))


print(isSublist([1,2,3,4],[1]))
print(isSublist([1,2,3,4],[2,3]))
print(isSublist([1,2,3,4],[1,2,4]))
print(isSublist([1,2,3,4],[1,2,3,4]))

Output:

True
True
False
True

Lookups:


Your faster version with cached length as well (based on Scott Mermelstein answer):

def isSublist1a(list1,list2): # cached length as well
    l1 = len(list1)
    for i in range(len(list2)):
        part=list2[i:]
        if len(part)<l1:
            return False
        if list1==part[:l1]:
            return True
    return False

list1=list(range(1000))
list2=list(range(400,420))
import timeit
print(timeit.timeit('isSublist1(list2,list1)', globals=globals(),number=1000))
print(timeit.timeit('isSublist1a(list2,list1)', globals=globals(),number=1000))
print(timeit.timeit('isSublist2(list2,list1)', globals=globals(),number=1000))

delivers (2 executions):

0.08652938600062043  # cached part
0.08017484299944044  # cached part + cached list1 lenght - slightly faster
0.15090413599955355  # non-cached version

0.8882850420004615   # cached part
0.8294611960000111   # cached part + cached list1 lenght - slightly faster
1.5524438030006422   # non-cached version
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • 1
    Ok, I understood, but this checks if list2 is a sublist of list1, returning false if they are equal. – t3m2 Apr 05 '19 at 20:37
  • @t3 I check if list2 is a sublist of list1 - the off by 1 is fixed and returns True now if both lists are equal – Patrick Artner Apr 05 '19 at 20:39
2

Supplementary to Patrick's excellent answer, let's try timeit with your actual code:

>>> def isSublist1(list1,list2):                                                                                                                                                                                                                              
...   for i in range(len(list2)):                                                                                                                                                                                                                             
...     part=list2[i:]                                                                                                                                                                                                                                        
...     if len(part)<len(list1):                                                                                                                                                                                                                              
...       return False                                                                                                                                                                                                                                        
...     if list1==part[:len(list1)]:
...       return True                                                                                                                                                                                                                                         
...   return False                                                                                                                                                                                                                                            
... 
>>> def isSublist2(list1,list2):
...   for i in range(len(list2)):
...     if len(list2[i:])<len(list1):
...       return False
...     if list1==list2[i:][:len(list1)]:
...       return True
...   return False
... 
>>> list1=list(range(10000))
>>> list2=list(range(4000,4020))
>>> import timeit
>>> timeit.timeit('isSublist1(list2,list1)', globals=globals(),number=100)
6.420147094002459
>>> timeit.timeit('isSublist2(list2,list1)', globals=globals(),number=100)
12.455138996010646

So on my system, the amount of time needed with your temporary variable is just about half the time needed without it.

I don't know the nature of your lists and sublists; you may want to change what list1 and list2 are to be a good reflection of how your code will be used, but at least on my side, it looks like saving that temporary variable is a very good idea.

Incidentally, let's do another interesting experiment:

>>> def isSublist3(list1,list2):
...   ln = len(list1)
...   for i in range(len(list2)):
...     part=list2[i:]
...     if len(part)<ln:
...       return False
...     if list1==part[:ln]:
...       return True
...   return False
... 
>>> timeit.timeit('isSublist1(list2,list1)',globals=globals(),number=100); timeit.timeit('isSublist3(list2,list1)',globals=globals(),number=100)
6.549526696035173
6.481004184985068

I ran it a few more times to see what I'd get:

6.470875242026523 6.463623657007702

6.151073662971612 5.787795798969455

5.685607994964812 5.655005165026523

6.439315696014091 6.372227535001002

Note that each time, the cached length takes less time than the non-cached length, though you don't get nearly the performance improvement you had by caching the slicing.

Note also that it's important not to draw too many conclusions from a single run of timeit. There are a lot of other variables affecting the timing, (in my case, pretty clearly, something happened to make it drop from 6.4 to 5.7 - in the middle of a single test!) so if you want to come up with a good rule you can count on, test several times to make sure you get consistent results.

Scott Mermelstein
  • 15,174
  • 4
  • 48
  • 76