Getting the type of Unicode in java

Question

This is an interview question:

Return true or false for a given string value and its corresponding unicode

public boolean decode (String value, String unicode){
    // logic goes here
}

for eg if the given inputs are

String value = "abc" String unicode = "UTF-8" return value is false    
String value"\u00A3" String unicode = "ASCII" return value is true

I read in an article that the unicode values are determined internally by bytes. So my first logic was trying to get the range ie for eg if it belongs to range between 40 and 63 its ASCII. Please correct if am wrong with this logic and if there is a better way to find out the unicode.

@MikeSamuel I think he's talking about encoding instead of unicode. — Drogba, Feb 26 '13 at 04:38
This question seems to conflate a number of different things around byte<->character encodings. Have you read ["The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets "](http://www.joelonsoftware.com/articles/Unicode.html)? — Mike Samuel, Feb 26 '13 at 04:58

score 0 · Answer 1 · answered Feb 26 '13 at 04:58

This is a fairly bad specification for a function. In an interview you'll need to respond by pretending a client has made this request for software implementation. So you'll ask gently for clarification about the intentions behind the specification. Or you'll introduce criticism within questions as if you're the student and you wish to be taught. You might say:

I'm not accustomed to use the word "Unicode" as a generic term for encodings such as ASCII and UTF-8. Am I correct that that's what the parameter is for? Could we name it "encoding", so that I will remember its purpose more easily?
So, it appears we're concerned with certain encodings, rather than, say, all the encodings that the Internet Engineering Task Force has ever named, am I right? You see, I'm referring to the MIME standard, which provides that IETF designates an official registry of names for encodings. There are hundreds or thousands of them.
I noticed that we are to return false to a query concerning UTF-8 when the text is "abc". Is that because the code points in that text are all in the range that UTF-8 has in common with ASCII, so that the encoded text is identical for UTF-8 encoding as for ASCII encoding? Do we do similarly for another encoding such as ISO-8859-1 which contains ASCII as a subset?

score 0 · Answer 2 · answered Feb 26 '13 at 06:20

Unicode Equivalent of ANSCI

ANSI characters 32 to 127 correspond to those in the 7-bit ASCII character set, which forms the Basic Latin Unicode character range. Characters 160–255 correspond to those in the Latin-1 Supplement Unicode character range.

As you can observe, there are ASCII equalvilent Unicode value in that table. So you better ask the interviewer what is really the requirements.

Getting the type of Unicode in java

2 Answers2