Longest Substring Without Repeating Characters

Question

I have been spending hours on Longest Substring Without Repeating Characters - LeetCode

Longest Substring Without Repeating Characters

Medium

Given a string, find the length of the longest substring without repeating characters.

Example 1:
Input: "abcabcbb"
Output: 3 
Explanation: The answer is "abc", with the length of 3. 
Example 2:
Input: "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.
Example 3:
Input: "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3. 
             Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

The problem could be solved using two pointer mixed kadane's algorithms to manipulate subarray

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        logging.debug(f"{list(enumerate(s))}")
        slo = fas = 0  #slow as the fisrt character in a subarray which not duplicate, fast as the fast.
                                  #relation: length = fas - slo
        current = set()
        glo = loc = 0
        while fas < len(s):
            logging.debug(f"pre_current: {current}, slow: {slo}, fast: {fas}")
            if s[fas] not in current: 
                current.add(s[fas]
                loc = fas - slo
                glo = max(glo, loc)
                 fas +=1
            else:
                current.remove(s[slo])
                slo += 1
            logging.debug(f"post_current: {current}, slow: {slo}, fast: {fas} \n")
        return glo

TestCase

    def test_g(self):
        s = "abccefg"
        answer = 4
        check = self.solution.lengthOfLongestSubstring(s)
        self.assertEqual(answer, check)

The solution is very clear to move slow and fast alternatively

$ python 3.LongestSubstring.py MyCase.test_g
DEBUG [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'c'), (4, 'e'), (5, 'f'), (6, 'g')]
DEBUG pre_current: set(), slow: 0, fast: 0
DEBUG post_current: {'a'}, slow: 0, fast: 1 

DEBUG pre_current: {'a'}, slow: 0, fast: 1
DEBUG post_current: {'b', 'a'}, slow: 0, fast: 2 

DEBUG pre_current: {'b', 'a'}, slow: 0, fast: 2
DEBUG post_current: {'b', 'c', 'a'}, slow: 0, fast: 3 

DEBUG pre_current: {'b', 'c', 'a'}, slow: 0, fast: 3
DEBUG post_current: {'b', 'c'}, slow: 1, fast: 3 

DEBUG pre_current: {'b', 'c'}, slow: 1, fast: 3
DEBUG post_current: {'c'}, slow: 2, fast: 3 

DEBUG pre_current: {'c'}, slow: 2, fast: 3
DEBUG post_current: set(), slow: 3, fast: 3 

DEBUG pre_current: set(), slow: 3, fast: 3
DEBUG post_current: {'c'}, slow: 3, fast: 4 

DEBUG pre_current: {'c'}, slow: 3, fast: 4
DEBUG post_current: {'c', 'e'}, slow: 3, fast: 5 

DEBUG pre_current: {'c', 'e'}, slow: 3, fast: 5
DEBUG post_current: {'e', 'f', 'c'}, slow: 3, fast: 6 

DEBUG pre_current: {'e', 'f', 'c'}, slow: 3, fast: 6
DEBUG post_current: {'g', 'e', 'f', 'c'}, slow: 3, fast: 7 

.
----------------------------------------------------------------------
Ran 1 test in 0.001s

As a conclusion, the solution employed two pointer technique and the idea of the Kadane algorithms. I assumed that it is possible to finally work it out after spending hours on debugging as a beginner.

However, I read such a delicate solution


class SolutionA:
    def lengthOfLongestSubstring(self, s):
        """
        :type s: str
        :rtype: int
        """
        #slow is the first  which not duplicate in a subarray
        #fast is the last whichi not duplicate in a subarray
        lookup, glo, slo, fas = {}, 0, 0, 0
        for fas, ch in enumerate(s):
            if ch in lookup: 
                slo = max(slo, lookup[ch]+1)
            elif ch not in lookup:
                glo = max(glo, fas-slo+1)                
            lookup[ch] = fas #update the duplicates and add new 
        return glo

The solution is very smart, I honestly don't believe one could design such a solution in hours if one did not read it before.

It used hash map , two times of kadane's algorithms idea and very concise structure.

Is it a common technique as two pointers? what's the name of it

So basically your question is what the second approach you show is called? — theblackips, Apr 22 '19 at 11:57
Not sure about kadane, but the 2 pointers processes each subarray like a window. So, I presume it's a [sliding window technique](https://stackoverflow.com/questions/8269916/what-is-sliding-window-algorithm-examples). — nice_dev, Apr 22 '19 at 13:38

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

As mentioned in the comments of the solution in the 2nd approach:

slow is the first which not duplicate in a subarray

fast is the last which is not duplicate in a subarray

it uses 2 pointers to keep track of the window size which does not have duplicate characters. If a duplicate is found, it updates the pointers accordingly.

In other words, it maintains a window and slides them further to see how long it could go with non-repeating characters property. So, this method is called sliding window technique.

This may look trivial for strings which have only 26 alphabetic characters, but it's very useful for UTF-8 type strings.

Longest Substring Without Repeating Characters

1 Answers1