Short answer
If the number of possible characters (not to be confused with the length of the strings) is not fixed (not the case here) the time complexity of your algorithm is O(n^2). If we make the assumption there are only a fixed number of valid characters (in this case 255
/4G
), your algorithm runs in worst-case O(n). If the condition holds, the algorithm can then easily be improved to run in O(1).
Note on asymptotic behavior and big-oh: these are theoretical results. It's not because an algorithm runs in O(1), it runs in reasonable time. It means it runs in constant time. So that - asymptotically speaking - it won't make any difference whether you enter a string with length 101000 or one with length 1010'000 (given these lengths are large enough). The time it takes can be more than one hundred times the age of the universe as well.
Explanation
You can do a simple more than worst-case analysis on the for loops:
for (; *scout != '\0'; ++scout, ++offset)
for (start = (char *)str + offset; *start != '\0'; ++start)
//single statement
Now we want to know how many times the single statement (it contains a fixed number of statements) will be repeated. Since you never modify the content of the string. We know that there is an index n at which the value is \0
.
So we can rewrite it as:
for (scout = 0,offset = 0; scout < n; ++scout, ++offset)
for (start = offset; *start < n; ++start)
//single statement
(I've assumed the string starts at memory address 0
), but since that's only a shift, that allowed, it makes it only simpler to reason about this.
Now we're going to calculate the number of statements in the inner for
loop (parameterized). That's equal to:

With o the offset and n the length of the string.
Now we can use this formula to calculate the number of instructions at the outer for
-loop level. Here o starts with 0
and iterates through (excluding) n
. So the total number of instructions is:

Which is O(n^2).
But now one has to ask: is it possible to construct such a string? The answer is no! There are only 255
valid characters (the NUL
character is not considered to be a character); if we cannot make this assumption the above holds. Say the first character is an a
(with a an arbitrary char), then it either matches with another a
in the string, which can be resolved in O(n) time (loop through the remainder of the string); or it means that all other characters are different from a
. In the first case, the algorithm terminates in O(n); in the second case, this means that the second character is different.
Let's say the second character is b
. Then we again iterate over the string in O(n) and if it finds another b
we terminate, after 2n or O(n) steps. If not, we need try to find a match for the next character c
.
The point is that we only need to do this at most 255
times: because there are only 255 valid characters. As a result the time complexity is 255n or O(n).
Alternative explanation
Another variant of this explanation is "if the outer for
loop is looking for the i-th character we know that all characters on the left of i, are different from that character (otherwise we would have already rejected earlier)." Now since there are only 255
characters and all characters on the left are different from each other and the current character, we know that for the 256
-th character of the string, we cannot find a new different character anymore, because there are no such characters.
Example
Say you have an alphabet with 3
characters (a
,b
and c
) - this only to make it easier to understand the matter. Now say we have a string:
scout
v
b a a a a a b
^
start
It is clear that your algorithm will use O(n) steps: the start
will simply iterate over the entire string once, reach b
and return.
Now say there is no duplicate of b
in the string. In that case the algorithm does not stop after iterating over the string once. But this implies all the other characters should differ from a (after all we've iterated over the string, and didn't find a duplicate). So now consider a string with that condition:
scout
v
b a a a a a a
^
start
Now it is clear that a first attempt to find a character b
in the remainder of the string will fail. Now your algorithm increments the scout:
scout
v
b a a a a a a
^
start
And starts searching for a
. Here we're very lucky: the first character is an a
. But if there is a duplicate a
; it would cost at most two iterations, so O(2n) to find the duplicate.
Now we're reaching the bound case: there is no a
either. In that case, we know the string must begin with
scout
v
b a c ...
We furthermore know that the remainder of the string cannot contain b
's nor a
's because otherwise the scout
would never have moved that far. The only remaining possibility is that the remainder of the string contains c
's. So the string reads:
scout
v
b a c c c c c c c c c c
^
start
This means that after iterating over the string at most 3 times, we will find such duplicate, regardless of the size of the string, or how the characters are distributed among that string.
Modify this to O(1) time
You can easily modify this algorithm to let it run in O(1) time: simply place additional bounds on the indices:
int strunique(const char *str)
{
size_t offset = 1;
char *scout = (char *)str, *start, *stop = scout+255;
for (; scout < stop && *scout != '\0'; ++scout, ++offset)
for (start = (char *)str + offset; start <= stop && *start != '\0'; ++start)
if (*start == *scout)
return 0;
return 1;
}
In this case we've bounded the first loop such that it visits at most the first 255 characters, the inner loop visits only the first 256 (notice the <=
instead of <
). So the total number of steps is bounded by 255 x 256 or O(1). The explanation above already demonstrates why this is sufficient.
Note: In case this is C
, you need to replace 255
by 4'294'967'296
which makes it indeed theoretically O(n), but practically O(n^2) in the sense that the constant factor before the n is that huge for O(n) that O(n^2) will outperform it.
Since we combine the string termination check with the 256
-check this algorithm will run nearly always faster than the one proposed above. The only source of potentially extra cycles is the additional testing that ships with the modified for
-loops. But since these lead to faster termination, it will in many cases not result in additional time.
Big-oh
One can say: "Yes that's true for strings with length greater than or equal to 256 characters.", "What about strings with a size less than 256
?". The point is that big-oh analysis deals with asymptotic behavior. Even if the behavior was super-exponential for some strings less than or equal to a certain length, you don't take these into account.
To emphasize the (problematic) aspect of asymptotic behavior more. One can say that the following algorithm would be correct asymptotically speaking:
int strunique(const char *str) {
return 0;
}
It always returns false; because "There is a length n0 such that for every input length n > n0 this algorithm will return the correct result." This has not much to do with big-oh itself, it's more to illustrate that one must be careful with saying an algorithm running in O(1) will outperform an algorithm in O(n^6) for any reasonable input. Sometimes the constant factor can be gigantic.