-3

IsUnique: Implement an algorithm to determine if a string has all unique characters.

In one of my version, I'm using RegExp. Can anyone please explain to me what is the time complexity of this algorithm and why?

const isUniqueV2 = function isUniqueV2(str) {
  const cleanStr = str.toLowerCase().replace(/[^a-z0-9]/g, '');
  const strlen = cleanStr.length;
  if(!strlen) return false;
  const reg = new RegExp(/(.)[^\1]*?\1/g);
  if(reg.test(cleanStr)) return false;
  return true;
}

Time Complexity of a RegExp varies with the implementation. I have a version with O(N). I just want to know will this one performs better than the one using dictionary O(N)?

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
sushobhan
  • 45
  • 7
  • 2
    The question doesn't appear to include any attempt at all to solve the problem. StackOverflow expects you to [try to solve your own problem first](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users), as your attempts help us to better understand what you want. Please edit the question to show what you've tried, and show a specific roadblock you're running into with [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve). For more information, please see [How to Ask](https://stackoverflow.com/help/how-to-ask). – Andreas Dec 11 '18 at 08:52
  • Whatever you're doing it's o[1], you're not looping over any elements but i'm not sure about how the RegExp works, it has to do some looping but i'm sure it's efficient shouldn't be more than o[n] if n is the length of the str – Omar Mneimneh Dec 11 '18 at 08:57
  • 1
    @OmarMneimneh Even though the pattern will require the engine to iterate over each character a number of times proportional to the length of the string? Worst-case, it'd require something around `0.5 * L^2` steps – CertainPerformance Dec 11 '18 at 09:00
  • @Andreas I have solved this problem is 4 versions. I have a clear idea about the rest of the solutions time complexities but as in this version I used Regex I get confused. I don't have any idea how to calculate time complexity for RegExp. – sushobhan Dec 11 '18 at 09:02
  • Possible duplicate of [What is the complexity of regular expression?](https://stackoverflow.com/questions/4378455/what-is-the-complexity-of-regular-expression) – Andreas Dec 11 '18 at 09:03
  • @CertainPerformance so you mean it can exceed o[n]? I wasn't concerned about the RegExp because I don't think a String is usually large, that's why i didn't take it into consideration, to be honest i don't know how the RegExp validation works – Omar Mneimneh Dec 11 '18 at 09:04
  • @OmarMneimneh I'm not sure on the terminology, but the *number of steps* is `O(N^2)`, where `N` is the length of the input string. If one considers RE engine steps to correspond to an algorithm's time complexity, then yeah, it'd be `O(N^2)` overall – CertainPerformance Dec 11 '18 at 09:05
  • 1
    I'm fairly sure it's linear. We have `toLowerCase()` which is O(n), and we have `replace()` which is O(n), and `test()` which is also also O(n). I don't see anything that would be O(n^2)... – Kresimir Dec 11 '18 at 09:07
  • @CertainPerformance I tried to make my expression non-greedy that why added the '?' quantifier. I think otherwise the regex engine has to do more work. – sushobhan Dec 11 '18 at 09:13
  • @sushobhan If the number of steps is the factor you're interested in, lazy quantifiers usually make a pattern take *longer* time to complete, not shorter, because after each character the engine has to check to see if the rest of the pattern matches before forward-tracking. Compare [`.*?x`](https://regex101.com/r/dLQ9N9/1) (15 steps) against `.*x` (4 steps) – CertainPerformance Dec 11 '18 at 09:16
  • @CertainPerformance Oh, I didn't know that. I thought non-greedy approach runs faster than greedy approach – sushobhan Dec 11 '18 at 09:17

1 Answers1

2

Technically, in the worst-case, the time complexity of the algorithm will be O(N), but why it's O(N) is a bit complicated. There are three operations to consider.

First, the toLowerCase() on the input string. This is O(N) with respect to the string's length, easy.

Second, the first .replace function: .replace(/[^a-z0-9]/g, ''). This is also O(N) - iterate through all characters, and replace non-alphanumeric characters with the empty string.

Third, and most complicated: the /(.)[^\1]*?\1/g test. Let's break down this regular expression first. Note that \1 inside a character set probably doesn't do what you think it is - it's not a backreference, it matches the unicode character at index 1, which is the Start of Heading control character:

console.log(/[\1]/.test(String.fromCharCode(1)));
console.log(String.fromCharCode(1));
// not the sort of thing that would appear in an ordinary string, as you can see

That's not what you want. Let's fix that, for the sake of simplicity - it won't make any difference for your algorithm's complexity, so let's assume we're using the pattern /(.).*?\1/ instead.

It will capture the first character in a group, then lazy-repeat any character, trying to find the character matched by the first group again. The regex engine will attempt this test starting at the first character in the string - if the string length is N, it will start at index 0 and iterate over indicies 0 to N - 1, checking whether any characters match the character at index 0. Since we're assuming the worst-case, we can assume that this will fail (no duplicates of the first character found), and we've carried out about N operations. Then, the engine will try to match starting at the next index, index 1, and will iterate over each following character until it gets to the end of the string (N - 1), looking for the same character matched at index 1. Worst-case, this will fail, we've just carried out around N - 1 operations, and the engine will forward-track another character, to index 2. See the pattern?

Starting index     ~Operations required to check this index    ~Total operations
0                  N                                           N
1                  N-1                                         2N-1
2                  N-2                                         3N-3
3                  N-3                                         4N-6
4                  N-4                                         5N-10
...
N-1                1                                           N^2 - 0.5N^2

Worst-case, there are no duplicate characters in the string, and the engine takes around 0.5N^2 steps to carry out the entire .test function. This isn't exact because there's some overhead related to matching the captured character, but it's insignificant compared to the N^2 factor. Try it out on regex101 - you can see that an input of 62 alphanumeric characters takes 2203 steps, which isn't that far from 0.5 * 62^2 = 1922.

So, because this .test function has O(N^2) complexity in the worst case, it would sound like the algorithm as a whole has O(N^2) complexity, right? Actually, no! The reason is that the first .replace(/[^a-z0-9]/g, '') ensures that the string being tested will contain only lower-case letters and digits (36 possible characters). This means that the .test can only iterate over a maximum of 36 characters before returning true - a 37th character (or any character after that) will necessarily be a duplicate of one of the previous characters, because there are only 36 possible unique characters. The worst-case string would look something like

0123456789abcdefghijklmnopqrstuvwxyzzzzzzzzzzzzzzzzzzzzzz...

which would require around 36N steps to get to the zs, find that they're duplicated, and pass the .test. So, the worst case for the .test, given the constrained input, is actually O(N), not O(N^2)!

To sum up: the toLowerCase() is O(N) in the worst-case. The .replace is O(N) in the worst case. Finally, the .test is O(N) in the worst case. So, your function's time complexity is O(N) in the worst case.

All that said, although it may be O(N), it's still relatively inefficient compared to your other implementation, which iterates over each character in the string and adds it as a property of an object, returning true once any duplicate character is found.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320