Can anybody explain hot to determine if the string has all unique characters without any additional data structures?

Question

I am not familiar with any ASCII rules or string ASCII representations. I have looked at the book Cracking the coding interview but cannot understand how come the string can have only 256 maximum character representations. How it is possible, can anybody explain it and help to solve the problem with the easiest explanation possible.

Here is the question:

Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional data structures

Thanks in advance!

Probably, it's duplicate of https://stackoverflow.com/questions/26197288/why-255-is-the-limit/40423285 — Arvind Kumar Avinash, Aug 12 '20 at 22:38
Does this answer your question? [Why 255 is the limit](https://stackoverflow.com/questions/26197288/why-255-is-the-limit) — ZF007, Aug 13 '20 at 09:09

score 1 · Answer 1 · answered Aug 12 '20 at 22:55

There is absolutely no need to limit yourself to 256 unique characters to solve this problem.

Here is a trivial algorithm:

First, consider the string not so much a java.lang.String, but instead, an array of characters.
Sort this array of characters, in place. This takes 0 additional space, and O(nlogn) time.
Loop through the char array, front to back, starting at index 1 (the second character). For each index, check if the char you find there is equal to the char you find at the previous index. If the answer is ever yes, return immediately, and answer false. If you get to the end without a hit, return true (the string consisted of unique characters).

runtime characteristic is O(n logn), requiring no additional space. Though you did mangle the input.

Now, in java this is a tad tricky; java.lang.String instances are immutable. You cannot modify them, therefore step 2 (sort-in-place) is not possible. You'd have to make a char[] copy via yourString.toCharArray() and then you can write this algorithm. That's an intentional restriction of string, not a fundamental limitation.

But, if you want to go with the rule that the input also cannot be modified in any way, and you can't make any new data structures, that's still possible, and still has absolutely no requirement that 'strings can only select from a plane of 256 characters'. It's just going to be far, far slower:

Loop through every character. i = position of the character.
Loop through every further character (from i+1, to the end).
Compare the characters at the two positions. If equal, return false.
if you get to the end, return true.

runtime characteristic is O(n^2) (icky), requiring no additional space, and does not modify any data in place.

The 256 thing just doesn't factor into any of this.

However, the thing is, a lot of code and examples incorrectly conflate the idea of 'a sequence of bytes' and 'a string' (as in, a sequence of characters), treating them all 'just as a bag o numbers'. At that point, if you have unicode characters, or charset encoding factors into the equation, all sorts of complications occur.

Properly written code knows that chars are chars, bytes are bytes, chars are never bytes and bytes are never chars, and every single last time you go from one to the other, you always, always, always specify which encoding, explicitly. If you do that, you'll never have any issues. But I guess your prof didn't want you to worry about it? I dunno - dumb restriction, not material whatsoever to the question.

score -1 · Answer 2 · answered Aug 12 '20 at 22:41

-1

It is because the ASCII table uses 8 bit, so there are maximum 2^8 possible combinations of characters. Actually, there aren't 256 but 255 since the first bit is used to store the size.

answered Aug 12 '20 at 22:41

Luiggi

1
1

ASCII is a 7-bit character code. (Unless you speak of extended ASCII). – S3DEV Aug 12 '20 at 22:46
Yes that's true, i forgot to specify it because of european habits – Luiggi Aug 12 '20 at 22:49
ASCII just isn't 8-bit. "Extended ASCII" is a group name, that as a concept doesn't tell you what to do, with, say, byte 0x95. It therefore does not explain this bizarre restriction at all. – rzwitserloot Aug 12 '20 at 22:56
And Java doesn't use ASCII anyway. Java uses Unicode, represented as 16-bit chars (UTF-16). – user13784117 Aug 13 '20 at 00:49

Can anybody explain hot to determine if the string has all unique characters without any additional data structures?

2 Answers2