Finding a minimum length substring in string S which contains all characters from string T using Hash Table in O(n)

Question

I know that this question has already been asked more than once.My doubt is not finding the solution of this problem, but the correct complexity. I read an answer here:

Minimum window width in string x that contains all characters in string y
In this solution,the complexity proposed is O(n) using hash table.Now till mapping the characters of string T in hash table is fine,but when it comes to finding the minimum and maximum element from the hash table,in almost every answer,I read a linked list is used and they write that the complexity in updating a linked list is O(1).This is where the doubt is.

Please consider the following example:

S:abcbae

T:abc

Initially t["a"],t["b"] and t["c"] will be -1,after 3 passes the values will be 0,1,2 respectively(in hash table) and the double linked list will be containing the elements [(a,0)-(b,1)-(c,2)].

Now the fourth character in string S is b,so before appending new value of index for b in DLL,I need to remove it's previous instance for which I will have to traverse the DLL.

My question is then how come is the solution for updating DLL is O(1),isn't it O(n),which would lead to a total complexity of O(n^2)?

[O(n) for traversing string S,then O(n) for updating DLL]

Please correct me if I am wrong.

Why are you putting the c++ tag here? You have to ask about concrete code when using it. Read the [tag wiki](http://stackoverflow.com/tags/c%2b%2b/info) before adding tags please — πάντα ῥεῖ, Oct 14 '16 at 13:03
I think this question requires the tag C++ as in java there is linkedhashmap which could provide a better complexity,though I am not sure. — Saurabh, Oct 14 '16 at 13:04
The linked answer uses "hash table" as a synonym for "associative array", or "map". You'll note that since the characters in T are only considered once, a simple array suffices. Then the idea is to simply assign, or overwrite, `t['a']` with the index it is encountered in S. You don't need _all_ the indices with the current problem statement. Another way to put it is your hash table buckets only ever have a constant length / size of 1. — Michael Foukarakis, Oct 14 '16 at 13:13
The comments below the answer address that. Inserting at the tail of a doubly-linked list is O(1), and in this case the list remains sorted because you're traversing left-to-right. — Michael Foukarakis, Oct 14 '16 at 13:21
Yeah, I read that but then the example taken there was not like this one,in that example, they removed the first element from DLL and append an element at last,which will be O(1),but what about a scenario when the element to be deleted is not the first? — Saurabh, Oct 14 '16 at 13:25

score 1 · Accepted Answer · edited May 23 '17 at 12:33

1

The data structure described is a doubly-linked list, visualised as follows, using the example in the answer:

HEAD <=> ('c', 0) <=> ('b', 3) <=> ('a', 5) <=> TAIL

coupled with an array, which is visualised as follows:

      'a'      |      'b'      |      'c'
 {5, &list[2]} | {3, &list[1]} | {0, &list[0]}

(notice the duplication of information)

At any point during the described algorithm, the following operations are necessary:

insertion of an element in the array
insertion of an element in the list
finding the index value for a given character
removal of a node from the list

The time-complexity of #1 is O(1) (trivial), and of #2 also O(1) because of our choice of doubly-linked list.

The time-complexity of #3 is also O(1), because of our choice of array. It would still be O(1) for a hash-table-backed map.

The time-complexity of #4 is O(1) because our array holds a pointer to each node in the list, and each character only ever has one node in the list, therefore removal is also trivial.

It's important to note (and understand) here that at any iteration of the proposed algorithm, the minimum window cannot be made larger than what it already was in the last step (why? prove it), and therefore we only need at most one node per character in the string Y (or T in your question).

Hope this helps.

edited May 23 '17 at 12:33

Community

1
1

answered Oct 14 '16 at 13:43

Michael Foukarakis

39,737
6
87
123

Ohhk,this was the point that I was missing,so T[a] not only contains it's index,but also the character's reference which is present in the DLL. This helped me clear the doubt.Thanks!! – Saurabh Oct 14 '16 at 18:53
Just one more doubt,if my string T is ***bdpa** and the table size is 4,so both "b" and "d" would be returning same index(according to my function),so for "d" I will linearly probe in the table untill I find a vacant cell.This would create a problem when I will be matching charcters from string S in string T,so is there a way to avoid collision in this question? – Saurabh Oct 17 '16 at 14:03
You are needlessly complicating things, but only because you're following the other answer blindly. There is no need for a hash table here, unless your alphabet (`len(T)`) is so large that actually using a map benefits you. Yes, linearly probing each bucket makes this algorithm quadratic. That's why you have to avoid it by using a simpler data structure. – Michael Foukarakis Oct 17 '16 at 14:09
Well,I am not following the answer blindly,this was an assignment question given in my college,that's why I am trying to implement this way. – Saurabh Oct 17 '16 at 14:17
Well, the answer to your previous question is yes, there is a way to avoid collisions: use a perfect hash function, which is guaranteed to exist for an alphabet and is only unsuitable if the alphabet is so large, it exceeds available memory. Since we've already established we need _at most one_ node per character, a simple array suffices. – Michael Foukarakis Oct 17 '16 at 14:21
Found an answer here:[link](http://stackoverflow.com/questions/7666509/hash-function-for-string),but if my table size is 4,then again,using this hashfunction would result in same indexes i.e. 1 for charachters "d" and "p". – Saurabh Oct 17 '16 at 14:36

Finding a minimum length substring in string S which contains all characters from string T using Hash Table in O(n)

1 Answers1