0

I'm doing leetcode 128 (longest consecutive sequence). This is the problem: problem picture

I wrote 2 working solutions, both using DFS.

The 1st solution uses lrucache to cache the DFS function

The 2nd solution uses a dictionary to do the same thing (I think).

Solution 1 is significantly faster (~500ms vs ~4000ms) but I don't really understand why. To my understanding, they are essentially doing the same thing. What am I missing?

Here are the 2 solutions:

Solution 1 (LRU Cache) (Faster):

def longestConsecutive(self, nums: List[int]) -> int:
    
    graph = dict()
    for num in nums:
        graph[num] = num+1
    
    @functools.lru_cache(maxsize = None)
    def dfs(num):
        if num in graph:
            return 1 + dfs(graph[num])
        return 0
    
    longest = 0
    for num in nums:
        longest = max(longest, dfs(num))
            
    return longest

Solution 2 (Dictionary Lookup) (Slower):

def longestConsecutive(self, nums: List[int]) -> int:
    
    graph = dict()
    for num in nums:
        graph[num] = num+1
    
    def dfs(num):
        if num in graph:
            return 1 + dfs(graph[num])
        return 0
    
    longest = 0
    for num in nums:
        if num-1 not in graph:
            longest = max(longest, dfs(num))
            
    return longest
erickcgt
  • 1
  • 1
  • Share the code here please. – memoricab Oct 11 '21 at 06:54
  • @memoricab sorry, done now – erickcgt Oct 11 '21 at 06:57
  • Did you read this one? https://stackoverflow.com/questions/49883177/how-does-lru-cache-from-functools-work – memoricab Oct 11 '21 at 07:00
  • @memoricab I see 1 comment on that post that mentions that lru_cache_wrapper() is implemented in C, which makes it faster than Python dictionaries. I can't find any further information online to corroborate / provide more detail on this comment. Do you happen to know more about this? – erickcgt Oct 11 '21 at 07:11

1 Answers1

0

The code 'does the same thing' when all numbers are unique. With caching or uniqueness filtering, the runtime is guaranteed to be linear, but currently is quadratic.

This problem is finding the maximum length of a path in a directed graph, where the graph is composed of disjoint unions of paths. The DFS just follows edges until it reaches a sink.

The first version guarantees each path is walked exactly once using caching; the second version tries to guarantee each path is walked exactly once by only calling DFS if we are at the start of a path.

Consider the following input:

nums = [1,2, ..., n] + [0]*n

Then we walk the full path of length n once for each 0, so n walks in total. You can fix the second version to be O(n) by adding 5 characters to remove duplicates:

    for num in set(nums):
        if num-1 not in graph:
            longest = max(longest, dfs(num))
kcsquared
  • 5,244
  • 1
  • 11
  • 36