1

I'm trying to obtain the numeric values of ASCII characters as mentioned in http://www.ascii-code.com/

String str = "™æ‡©Æ";
for(int i = 0; i < str.length() ; i++) {
    char c = str.charAt(i);
    int code = (int) c;
    System.out.println(c + ":" +code);
}

Output:

™:8482
æ:230
‡:8225
©:169
Æ:198

My question is: Why the values of '™' and '‡' is not '153' and '135' respectively? and How can I obtain those values, if possible?

codelion
  • 1,056
  • 1
  • 11
  • 17
nexuscreator
  • 835
  • 1
  • 9
  • 17
  • 3
    Refer to the *Unicode* tables, not the "ASCII" tables. All strings in Java are sequences of Unicode characters (encoded as UTF-16). – user2864740 Aug 09 '14 at 13:47
  • http://en.wikipedia.org/wiki/List_of_Unicode_characters The first 128 characters from ASCII and Unicode are the same. – Jeroen Vannevel Aug 09 '14 at 13:47
  • Check this:- http://www.fileformat.info/info/unicode/char/2122/index.htm – Rahul Tripathi Aug 09 '14 at 13:50
  • 1
    Any character with a numeric value greater than 127 is not ASCII, by definition. – Hot Licks Aug 09 '14 at 13:50
  • @user2864740 Thank you for your comment. As those characters are mentioned in 'extended ASCII' table, I'm trying to evaluate them as mentioned in exteneded ASCII table. Is it possible? – nexuscreator Aug 09 '14 at 13:51
  • possible duplicate of [Get ASCII value at input word](http://stackoverflow.com/questions/7443975/get-ascii-value-at-input-word) – Mureinik Aug 09 '14 at 13:52
  • 1
    "Extended ASCII" is not ASCII, nor is it Unicode. Java does not use "Extended ASCII", other than to know how to translate an ISO 8859-1 byte array to Unicode if you use the right method. – Hot Licks Aug 09 '14 at 13:53
  • 1
    @all Then my question is probably wrong to mention them as ASCII. Thank you all. – nexuscreator Aug 09 '14 at 13:56

4 Answers4

5

The characters which are having an ASCII value more than 128 are not ASCII characters rather it would be better to say them Unicode characters. Also Extended ASCII is not ASCII. You may better refer Unicode tables.

Also to mention that Java uses Unicode internally. And it does not use ASCII internally. Actually, it uses UTF-16 most of the time

You may refer this and List of Unicode characters.

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
0

ASCII assigns values only to 128 characters (a-z, A-Z, 0-9, space, some punctuation, and some control characters). The first 128 Unicode code points are the same as ASCII.

Unicode is a computing industry standard designed to consistently and uniquely encode characters used in written languages throughout the world. The Unicode standard uses hexadecimal to express a character.

There are two common formats for Unicode, UTF-8 which uses 1-4 bytes for each value (so for the first 128 characters, UTF-8 is exactly the same as ASCII) and UTF-16, which uses 2 or 4 bytes.

SparkOn
  • 8,806
  • 4
  • 29
  • 34
  • 1
    Make sure to *quote* verbatim text as appropriate: I gave a -1 because not doing so is neither fair to readers or the original author(s). Use the `>` character at the start of a line to being a block quote. – user2864740 Aug 09 '14 at 14:28
0

While I did not look into Javadocs for a converter, I did create an example to show why ASCII and Java Unicode are not easily compatible. What I have here will convert the Unicode character into a byte array and then to a string representing the byte array. I would suggest that rather than using a Java class, create an array of the ASCII equivalent and reference the array for output.

  public void showChars()  
    {  
        char c = ' ';  
        int end = 8192;
        for(int i=0;i<end;++i)
        {
            try {
                c = (char) i;
                byte[] data = Character.toString((char) i).getBytes("UTF8");
                String byteStr = Arrays.toString(data);
                System.out.println("" + i + " char is " + c + " or " + byteStr);
            } catch (UnsupportedEncodingException ex) {
                Logger.getLogger(Dinker.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
    }  
0

For the sake of answering the second question that was asked:

final String str = "™æ‡©Æ";

final byte[] cp1252Bytes = str.getBytes("windows-1252");
for (final byte b: cp1252Bytes) {
    final int code = b & 0xFF;
    System.out.println(code);
}

Associating the code with each text element is more work.

final String str = "™æ‡©Æ";

final int length = str.length();
for (int offset = 0; offset < length; ) {
    final int codepoint = str.codePointAt(offset);
    final int codepointLength = Character.charCount(codepoint);
    final String codepointString = str.substring(offset, offset + codepointLength);
    System.out.println(codepointString);
    final byte[] cp1252Bytes = codepointString.getBytes("windows-1252");
    for(final byte code : cp1252Bytes) {
        System.out.println(code  & 0xFF);
    }
    offset += codepointLength;
}    

This is somewhat easier Java 8's String.codePoints() method:

final String str = "™æ‡©Æ";

str.codePoints()
    .mapToObj(i ->  new String(Character.toChars(i)))
    .forEach(c -> { 
        try {
            System.out.println(
                String.format("%s %s", 
                    c, 
                    unsignedBytesToString(c.getBytes("Windows-1252"))));
        } catch (Exception e) {
            e.printStackTrace();
        }
    });
Tom Blodget
  • 20,260
  • 3
  • 39
  • 72