0

I'm using JAVA and I'm trying to add the ASCII character 29(Group Separator) to a String(alphanumeric) as part of my algorithm. But I'm unable to verify the output since it doesnt get printed.

If its a non-printable character, is there any other way I can verify that it does get added.

Tried 1)Printing it like any other ASCII character 2)Printing its HEX value(0x1D)

System.out.println("Test1====="+Character.toString((char)0x1D));

System.out.println("Test3====="+String.valueOf(Character.toChars(29)));

Expected Result:Verify its printed. Actual Result:Unable to verify.

SidTechs1
  • 67
  • 4
  • 12
  • You can look into the char array of the string you are building (saving it in a variable first), then you can see if it is saved inside the string or not. – Progman Mar 29 '19 at 20:00
  • 1
    @karthick: Its not duplicate, I had a look into that question before posting my question,and the purpose of both are different. – SidTechs1 Mar 29 '19 at 20:52
  • @Progman: Yes thats what I'm trying to see how to verify if its saved inside the String or not – SidTechs1 Mar 29 '19 at 20:54
  • 1
    `System.out.println("Test1=====\u001D");` The question shouldn't be can you get the character into the String but can you get it out through your output stream, which is affected by your Java implementation and terminal and user settings and operating system, …. – Tom Blodget Mar 29 '19 at 22:20
  • fwiw, title says character 29, question text says 0x1d (=29), code sample says 0x1f (=31). –  Mar 30 '19 at 02:33
  • @another-dave: Edited the question. Everything else was already correct.Only one line(System.out.println("Test1====="+Character.toString((char)0x1F)); was a typo from me, and thats been modified now.Thanks – SidTechs1 Apr 01 '19 at 14:52

2 Answers2

2

Maybe write a function that traverses a string and compare every char to Character.toChars(29)? Something along the lines of:

String str = "Foo Bar" + yourCharacter29ToString;
for(int i=0;i<str.length();i++){
if(Character.toChars(29) == str.charAt(i)){
 return true;
    }
  }
return false;

This could be enough as a proof of concept. (i did not check above code - read it as pseudo-code please)

hyperlinq
  • 31
  • 5
  • Sure I will try this and updated you in few minutes. – SidTechs1 Mar 29 '19 at 20:54
  • Thanks @hyperlinq.This helped ,almost. Just changed the Character.toChars(29) to its hex value (0x1D) since only this format was compatible.Thanks. – SidTechs1 Mar 29 '19 at 21:29
  • 1
    It cannot possibly be the case that toChars(0x1d) works and toChars(29) does not. It is the exact same number in both cases; numbers are not "hexadecimal" or "decimal" in the machine, these are just two different ways of writing the same thing in the source code. –  Mar 30 '19 at 02:37
  • @another-dave: Was wondering the same, why it worked with 0x1d and not 29. Below table at *this link* helped to clarify: https://www.systutorials.com/4670/ascii-table-and-ascii-code/ . – SidTechs1 Apr 01 '19 at 14:59
  • Decimal Octal Hex Binary Character ------- ----- ---- -------- ---------------------------- 028 034 0x1C 00011100 FS (File Separator) 029 035 0x1D 00011101 GS (Group Separator) ctrl 030 036 0x1E 00011110 RS (Request to Send)(Record Separator) code 031 037 0x1F 00011111 US (Unit Separator) – SidTechs1 Apr 01 '19 at 15:03
  • It appears unformatted here. Copy paste to a text editor and it comes out fine, formatted – SidTechs1 Apr 01 '19 at 15:04
1

To see which codepoints are in a String, you can use Character.getName(codepoint)

int[] codepoints = ("Test3====="+String.valueOf(Character.toChars(29)))
    .codePoints()
    .toArray(); // optionally, set up for traditional for loop

for (int codepoint : codepoints) {
    char[] utf16 = Character.toChars(codepoint); // always one or two code units

    if (utf16.length == 2) {
        System.out.println(
          String.format("U+%04X \\u%04X\\u%04X %s", 
            codepoint, (int)utf16[0], (int)utf16[1], Character.getName(codepoint)));

    } else {
        System.out.println(
          String.format("U+%04X \\u%04X %s", 
            codepoint, (int)utf16[0], Character.getName(codepoint)));
    }
}

The UTF-16 character encoding encodes a codepoint from the Unicode character set with one or two code units (char).

(Not sure how the existence of the ASCII character set is relevant to this project—or most any project. If you have bytes for ASCII-encoded text or need bytes for ASCII-encoded text, that's a different question. But, Java uses UTF-16 for text datatypes.)

Tom Blodget
  • 20,260
  • 3
  • 39
  • 72
  • 1
    Upvoted for the comment that (to paraphrase) not all text is called "ASCII". One might think that after 28 years of Unicode the world would have got it :-) –  Mar 30 '19 at 02:41
  • Thanks for your inputs @Tom Blodget ! – SidTechs1 Apr 01 '19 at 15:05