Technically, in the worst-case, the time complexity of the algorithm will be O(N)
, but why it's O(N)
is a bit complicated. There are three operations to consider.
First, the toLowerCase()
on the input string. This is O(N)
with respect to the string's length, easy.
Second, the first .replace
function: .replace(/[^a-z0-9]/g, '')
. This is also O(N)
- iterate through all characters, and replace non-alphanumeric characters with the empty string.
Third, and most complicated: the /(.)[^\1]*?\1/g
test. Let's break down this regular expression first. Note that \1
inside a character set probably doesn't do what you think it is - it's not a backreference, it matches the unicode character at index 1, which is the Start of Heading control character:
console.log(/[\1]/.test(String.fromCharCode(1)));
console.log(String.fromCharCode(1));
// not the sort of thing that would appear in an ordinary string, as you can see
That's not what you want. Let's fix that, for the sake of simplicity - it won't make any difference for your algorithm's complexity, so let's assume we're using the pattern /(.).*?\1/
instead.
It will capture the first character in a group, then lazy-repeat any character, trying to find the character matched by the first group again. The regex engine will attempt this test starting at the first character in the string - if the string length is N
, it will start at index 0
and iterate over indicies 0
to N - 1
, checking whether any characters match the character at index 0
. Since we're assuming the worst-case, we can assume that this will fail (no duplicates of the first character found), and we've carried out about N operations. Then, the engine will try to match starting at the next index, index 1
, and will iterate over each following character until it gets to the end of the string (N - 1
), looking for the same character matched at index 1. Worst-case, this will fail, we've just carried out around N - 1
operations, and the engine will forward-track another character, to index 2. See the pattern?
Starting index ~Operations required to check this index ~Total operations
0 N N
1 N-1 2N-1
2 N-2 3N-3
3 N-3 4N-6
4 N-4 5N-10
...
N-1 1 N^2 - 0.5N^2
Worst-case, there are no duplicate characters in the string, and the engine takes around 0.5N^2
steps to carry out the entire .test
function. This isn't exact because there's some overhead related to matching the captured character, but it's insignificant compared to the N^2
factor. Try it out on regex101 - you can see that an input of 62 alphanumeric characters takes 2203 steps, which isn't that far from 0.5 * 62^2 = 1922
.
So, because this .test
function has O(N^2)
complexity in the worst case, it would sound like the algorithm as a whole has O(N^2)
complexity, right? Actually, no! The reason is that the first .replace(/[^a-z0-9]/g, '')
ensures that the string being tested will contain only lower-case letters and digits (36 possible characters). This means that the .test
can only iterate over a maximum of 36 characters before returning true
- a 37th character (or any character after that) will necessarily be a duplicate of one of the previous characters, because there are only 36 possible unique characters. The worst-case string would look something like
0123456789abcdefghijklmnopqrstuvwxyzzzzzzzzzzzzzzzzzzzzzz...
which would require around 36N
steps to get to the z
s, find that they're duplicated, and pass the .test
. So, the worst case for the .test
, given the constrained input, is actually O(N)
, not O(N^2)
!
To sum up: the toLowerCase()
is O(N)
in the worst-case. The .replace
is O(N)
in the worst case. Finally, the .test
is O(N)
in the worst case. So, your function's time complexity is O(N)
in the worst case.
All that said, although it may be O(N)
, it's still relatively inefficient compared to your other implementation, which iterates over each character in the string and adds it as a property of an object, returning true
once any duplicate character is found.