20

Find minimum window width in string x that contains all characters of another string y. For example:

String x = "coobdafceeaxab"
String y = "abc"

The answer should be 5, because the shortest substring in x that contains all three letters of y is "bdafc".

I can think of a naive solution with complexity O(n^2 * log(m)), where n = len(x) and m = len(y). Can anyone suggest a better solution? Thanks.

Update: now think of it, if I change my set to tr1::unordered_map, then I can cut the complexity down to O(n^2), because insertion and deletion should both be O(1).

meowgoesthedog
  • 14,670
  • 4
  • 27
  • 40
grokus
  • 18,046
  • 9
  • 29
  • 35
  • Are those really interview questions? – Andrei Ciobanu Aug 30 '10 at 23:03
  • @Andrei, now think of it I shouldn't have tagged this "interview-questions", I saw this question somewhere on the Internet, I don't really know if it was an interview question or not. Sorry for the confusion. – grokus Aug 31 '10 at 00:10
  • possible duplicate of [Google search results: How to find the minimum window that contains all the search keywords?](http://stackoverflow.com/questions/2734313/google-search-results-how-to-find-the-minimum-window-that-contains-all-the-sear) – user Jul 10 '13 at 10:59

7 Answers7

23

time: O(n) (One pass)
space: O(k)

This is how I would do it:
Create a hash table for all the characters from string Y. (I assume all characters are different in Y).

First pass:
Start from first character of string X.
update hash table, for exa: for key 'a' enter location (say 1).
Keep on doing it until you get all characters from Y (until all key in hash table has value).
If you get some character again, update its newer value and erase older one.

Once you have first pass, take smallest value from hash table and biggest value.
Thats the minimum window observed so far.

Now, go to next character in string X, update hash table and see if you get smaller window.


Edit:

Lets take an example here:
String x = "coobdafceeaxab"
String y = "abc"

First initialize a hash table from characters of Y.
h[a] = -1
h[b] = -1
h[c] = -1

Now, Start from first character of X.
First character is c, h[c] = 0
Second character (o) is not part of hash, skip it.
..
Fourth character (b), h[b] = 3
..
Sixth character(a), enter hash table h[a] = 5.
Now, all keys from hash table has some value.
Smallest value is 0 (of c) and highest value is 5 (of a), minimum window so far is 6 (0 to 5).
First pass is done.

Take next character. f is not part of hash table, skip it.
Next character (c), update hash table h[c] = 7.
Find new window, smallest value is 3 (of b) and highest value is 7 (of c).
New window is 3 to 7 => 5.

Keep on doing it till last character of string X.

I hope its clear now.


Edit

There are some concerns about finding max and min value from hash.
We can maintain sorted Link-list and map it with hash table.
Whenever any element from Link list changes, it should be re-mapped to hash table.
Both these operation are O(1)

Total space would be m+m


Edit

Here is small visualisation of above problem: For "coobdafceeaxab" and "abc"

step-0:
Initial doubly linked-list = NULL
Initial hash-table = NULL

step-1:
Head<->[c,0]<->tail
h[c] = [0, 'pointer to c node in LL']

step-2:
Head<->[c,0]<->[b,3]<->tail
h[c] = [0, 'pointer to c node in LL'], h[b] = [3, 'pointer to b node in LL'],

Step-3:
Head<->[c,0]<->[b,3]<->[a,5]<->tail
h[c] = [0, 'pointer to c node in LL'], h[b] = [3, 'pointer to b node in LL'], h[a] = [5, 'pointer to a node in LL']
Minimum Window => difference from tail and head => (5-0)+1 => Length: 6

Step-4:
Update entry of C to index 7 here. (Remove 'c' node from linked-list and append at the tail)
Head<->[b,3]<->[a,5]<->[c,7]<->tail
h[c] = [7, 'new pointer to c node in LL'], h[b] = [3, 'pointer to b node in LL'], h[a] = [5, 'pointer to a node in LL'],
Minimum Window => difference from tail and head => (7-3)+1 => Length: 5

And so on..

Note that above Linked-list update and hash table update are both O(1).
Please correct me if I am wrong..


Summary:

TIme complexity: O(n) with one pass
Space Complexity: O(k) where k is length of string Y

Jack
  • 1,398
  • 3
  • 16
  • 26
  • I don't actually understand what you are describing. Can you take my example above and illustrate how your arrive at the correct answer? – grokus Aug 29 '10 at 02:34
  • @Jack, +1 for you, however I must say each time you look up min and max values in your hash, it takes O(m), so your overall complexity is O(nm), not O(n) as you advertised. – grokus Aug 29 '10 at 20:30
  • Well, there is a way you can maintain ordered list where you can keep track of first and last occurrence and map it to hash. whenever you change key/value, update its value in list using hash. This is O(1) due to hash. – Jack Aug 30 '10 at 06:21
  • Maintain Link list and map with hash. Total complexity will be O(n), space will be m + m. – Jack Aug 30 '10 at 10:12
  • Check out my answer for implementations in Java and C++ of this method. – Sheldon L. Cooper Aug 31 '10 at 16:31
  • +1 for solution and having ordered list of indexes - I was thinking pretty much the same approach. – DK. Sep 01 '10 at 22:25
  • 1
    @Jack, I will accept this answer if I can convince myself all operations of insert/delete/min/max are O(1). I'm not a data structure or algorithms guru, so could you explain how you maintain your hash and sorted linked list at the same time O(1) operations? Perhaps write some pseudo code to illustrate? Or if you can can point me to some online resource it'd be good too. – grokus Sep 03 '10 at 16:39
  • 3
    Nice algorithm. @grokus, the hash elements point back to the linked list elements, so that when you see a new character you can quickly find and delete it from the linked list. Insertions to the linked list always happen at the tail, which is also O(1) if you use a doubly linked list, and this keeps the list in sorted order because we're progressing from left to right. (Deleting elements from the middle doesn't destroy the order.) One point: you only need to find the minimum position, since the maximum position will always be the character you just added. – j_random_hacker Dec 11 '12 at 10:44
7

I found this very nice O(N) time complexity version here http://leetcode.com/2010/11/finding-minimum-window-in-s-which.html, and shortened it slightly (removed continue in a first while , which allowed to simplify condition for the second while loop). Note, that this solution allows for duplicates in the second string, while many of the above answers do not.

private static String minWindow(String s, String t) {
    int[] needToFind = new int[256];
    int[] hasFound = new int[256];

    for(int i = 0; i < t.length(); ++i) {
       needToFind[t.charAt(i)]++;
    }

    int count = 0;
    int minWindowSize = Integer.MAX_VALUE;
    int start = 0, end = -1;
    String window = "";

    while (++end < s.length()) {
        char c = s.charAt(end);
        if(++hasFound[c] <= needToFind[c]) {
           count++;
        }

        if(count < t.length()) continue;
        while (hasFound[s.charAt(start)] > needToFind[s.charAt(start)]) {
           hasFound[s.charAt(start++)]--;
        }

        if(end - start + 1 < minWindowSize) {
           minWindowSize = end - start + 1;
           window = s.substring(start, end + 1);
        }
    }
    return window;
}
javabrew
  • 306
  • 4
  • 7
  • Also, this is not `O(n)`, because your second while loops **re-iterates** the string `s`, by incrementing `start++`. As long as you go through a character in original string `s` more than once in a loop, the complexity will be more than `O(n)` – Om Shankar Jan 22 '16 at 19:54
5

Here's my solution in C++:

int min_width(const string& x, const set<char>& y) {
  vector<int> at;
  for (int i = 0; i < x.length(); i++)
    if (y.count(x[i]) > 0)
      at.push_back(i);

  int ret = x.size();
  int start = 0;
  map<char, int> count;

  for (int end = 0; end < at.size(); end++) {
    count[x[at[end]]]++;
    while (count[x[at[start]]] > 1)
      count[x[at[start++]]]--;
    if (count.size() == y.size() && ret > at[end] - at[start] + 1)
      ret = at[end] - at[start] + 1;
  }
  return ret;
}

Edit: Here's an implementation of Jack's idea. It's the same time complexity as mine, but without the inner loop that confuses you.

int min_width(const string& x, const set<char>& y) {
  int ret = x.size();
  map<char, int> index;
  set<int> index_set;

  for (int j = 0; j < x.size(); j++) {
    if (y.count(x[j]) > 0) {
      if (index.count(x[j]) > 0)
        index_set.erase(index[x[j]]);
      index_set.insert(j);
      index[x[j]] = j;
      if (index.size() == y.size()) {
        int i = *index_set.begin();
        if (ret > j-i+1)
          ret = j-i+1;
      }
    }
  }
  return ret;
}

In Java it can be implemented nicely with LinkedHashMap:

static int minWidth(String x, HashSet<Character> y) {
    int ret = x.length();
    Map<Character, Integer> index = new LinkedHashMap<Character, Integer>();

    for (int j = 0; j < x.length(); j++) {
        char ch = x.charAt(j);
        if (y.contains(ch)) {
            index.remove(ch);
            index.put(ch, j);
            if (index.size() == y.size()) {
                int i = index.values().iterator().next();
                if (ret > j - i + 1)
                    ret = j - i + 1;
            }
        }
    }
    return ret;
}

All operations inside the loop take constant time (assuming hashed elements disperse properly).

Sheldon L. Cooper
  • 3,230
  • 19
  • 13
  • The time complexity is linear in the size of the string "x" (assuming a constant size alphabet.) This implementation assumes there's always a solution. – Sheldon L. Cooper Aug 28 '10 at 20:30
  • @Sheldon, I tested your code and it does produce correct result, however you have a while loop inside a for loop, so the complexity looks to be O(n^2) to me. – grokus Aug 29 '10 at 02:49
  • 1
    No, it's O(n). The while loop executes at most n iterations OVERALL, since "start" is incremented at each iteration. – Sheldon L. Cooper Aug 29 '10 at 04:59
  • In other words, you're claiming that the code works but the while loop iterates more than n times overall. Which implies that the variable "start" would have a value greater than n. Then, the vector "at" is accessed outside its boundaries. Hence, the code is broken. Contradiction, that comes from assuming that the while loop iterates more than n times overall. – Sheldon L. Cooper Aug 29 '10 at 05:35
  • Only looking at the first loop, this code is already O(n log m). – Nabb Aug 29 '10 at 07:19
  • > The time complexity is linear in the size of the string "x" (**assuming a constant size alphabet.**) – Sheldon L. Cooper Aug 29 '10 at 13:09
  • 1
    You can always implement "count" as an hash map or an array, if you don't want to assume that. – Sheldon L. Cooper Aug 29 '10 at 13:16
  • +1 for you, you should use tr1::unordered_map instead of map or set. – grokus Aug 29 '10 at 20:31
3

There is an O(n solution to this problem). It very well described in this article. http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html Hope it helps.

nmd
  • 823
  • 1
  • 7
  • 17
1

This is my solution in C++, just for reference.

Update: originally I used std::set, now I change it to tr1::unordered_map to cut complexity down to n^2, otherwise these two implementations look pretty similar, to prevent this post from getting too long, I only list the improved solution.

#include <iostream>
#include <tr1/unordered_map>
#include <string>

using namespace std;
using namespace std::tr1;

typedef tr1::unordered_map<char, int> hash_t;

// Returns min substring width in which sentence contains all chars in word
// Returns sentence's length + 1 if not found
size_t get_min_width(const string &sent, const string &word) {
    size_t min_size = sent.size() + 1;
    hash_t char_set; // char set that word contains
    for (size_t i = 0; i < word.size(); i++) {
        char_set.insert(hash_t::value_type(word[i], 1));
    }
    for (size_t i = 0; i < sent.size() - word.size(); i++) {
        hash_t s = char_set;
        for (size_t j = i; j < min(j + min_size, sent.size()); j++) {
            s.erase(sent[j]);
            if (s.empty()) {
                size_t size = j - i + 1;
                if (size < min_size) min_size = size;
                break;
            }
        }
    }
    return min_size;
}

int main() {
    const string x = "coobdafceeaxab";
    const string y = "abc";
    cout << get_min_width(x, y) << "\n";
}
grokus
  • 18,046
  • 9
  • 29
  • 35
0

An implementation of Jack's idea.

public int smallestWindow(String str1, String str2){
        if(str1==null || str2==null){
            throw new IllegalArgumentException();
        }
        Map<String, Node> map=new HashMap<String, Node>();
        Node head=null, current=null;

        for(int i=0;i<str1.length();i++){
            char c=str1.charAt(i);
            if(head==null){
                head=new Node(c);
                current=head;
                map.put(String.valueOf(c), head);
            }
            else{
                current.next=new Node(c);
                current.next.pre=current;
                current=current.next;
                map.put(String.valueOf(c), current);
            }
        }

        Node end=current;
        int min=Integer.MAX_VALUE;
        int count=0;

        for(int i=0;i<str2.length();i++){
            char c = str2.charAt(i);
            Node n=map.get(String.valueOf(c));
            if(n!=null){
                if(n.index==Integer.MAX_VALUE){
                    count++;
                }
                n.index=i;
                if(n==head){
                    Node temp=head;
                    head=head.next;
                    if(head==null){//one node
                        return 1;
                    }
                    head.pre=null;        
                    temp.pre=end;
                    end.next=temp;
                    temp.next=null;
                    end=temp;
                }
                else if(end!=n){
                    n.pre.next=n.next;
                    n.next.pre=n.pre;
                    n.pre=end;
                    n.next=null;
                    end.next=n;
                    end=n;
                }
                if(count==str1.length()){
                    min=Math.min(end.index-head.index+1, min);
                }
            }
        }
        System.out.println(map);
        return min;
    }
Trying
  • 14,004
  • 9
  • 70
  • 110
0

Simple java solution using the sliding window. Extending NitishMD's idea above:

public class StringSearchDemo {

public String getSmallestSubsetOfStringContaingSearchString(String toMatch,
        String inputString) {
    if (inputString.isEmpty() || toMatch.isEmpty()) {
        return null;
    }
    //      List<String> results = new ArrayList<String>(); // optional you can comment this out

    String smallestMatch = "";
    //      String largestMatch = ""; 
    int startPointer = 0, endPointer = 1;
    HashMap<Character, Integer> toMatchMap = new HashMap<Character, Integer>();

    for (char c : toMatch.toCharArray()) {
        if (toMatchMap.containsKey(c)) {
            toMatchMap.put(c, (toMatchMap.get(c) + 1));
        } else {
            toMatchMap.put(c, 1);
        }
    }

    int totalCount = getCountofMatchingString(toMatchMap, toMatch);
    for (int i = 0; i < inputString.length();) {
        if (!toMatchMap.containsKey(inputString.charAt(i))) {
            endPointer++;
            i++;
            continue;
        }
        String currentSubString = inputString.substring(startPointer,
                endPointer);
        if (getCountofMatchingString(toMatchMap, currentSubString) >= totalCount) {
            //              results.add(currentSubString); // optional you can comment this out
            if (smallestMatch.length() > currentSubString.length()) {
                smallestMatch = currentSubString;
            } else if (smallestMatch.isEmpty()) {
                smallestMatch = currentSubString;
            }
            //              if (largestMatch.length() < currentSubString.length()) {
            //                  largestMatch = currentSubString;
            //              }
            startPointer++;
        } else {
            endPointer++;
            i++;
        }

    }
    //      System.out.println("all possible combinations = " + results); // optional, you can comment this out
    //      System.out.println("smallest result = " + smallestMatch);
    //      System.out.println("largest result = " + largestMatch);
    return smallestMatch;
}

public int getCountofMatchingString(HashMap<Character, Integer> toMatchMap,
        String toMatch) {
    int match = 0;
    HashMap<Character, Integer> localMap = new HashMap<Character, Integer>();

    for (char c : toMatch.toCharArray()) {
        if (toMatchMap.containsKey(c)) {
            if (localMap.containsKey(c)) {
                if (localMap.get(c) < toMatchMap.get(c)) {
                    localMap.put(c, (localMap.get(c) + 1));
                    match++;
                }
            } else {
                localMap.put(c, 1);
                match++;
            }
        }
    }
    return match;
}

public static void main(String[] args) {
    String inputString = "zxaddbddxyy由ccbbwwaay漢字由来"; 
    String matchCriteria = "a由";
    System.out.println("input=" + matchCriteria);
    System.out.println("matchCriteria=" + inputString);
    String result = (new StringSearchDemo())
            .getSmallestSubsetOfStringContaingSearchString(matchCriteria, inputString);
    System.out.println("smallest possbile match = " + result);
}

}

Navneet Kumar
  • 909
  • 7
  • 6