Code points, and Control Picture characters
I can add two points:
- The
char
type is essentially broken since Java 2, and legacy since Java 5. Best to use code point integers when working with individual characters.
- Unicode defines characters for display as placeholders for control characters. See Control Pictures section of one Wikipedia page, and see another page, Control Pictures.
For example, the NULL character at code point 0 decimal has a matching SYMBOL FOR NULL character at 9,216 decimal: ␀
. To see all the Control Picture characters, use this PDF section of the Unicode standard specification.
Get an array of the code point integers representing each of the characters in your string.
int[] codePoints = myString.codePoints().toArray() ;
Loop those code points. Replace those of interest.
Here is some untested code.
int[] replacedCodePoints = new int[ codePoints.length ] ;
int index = 0 ;
for ( int codePoint : codePoints )
{
if( codePoint >= 0 && codePoint <= 32 ) // 32 is SPACE, so you may want to use 31 depending on your context.
{
replacedCodePoints[ index ] = codePoint + 9_216 ; // 9,216 is the offset to the beginning of the Control Picture character range defined in Unicode.
} else if ( codePoint == 127 ) // DEL character.
{
replacedCodePoints[ index ] = 9_249 ;
} else // Any other character, we keep as-is, no replacement.
{
replacedCodePoints[ index ] = codePoint ;
}
i ++ ; // Set up the next loop.
}
Convert code points back into text. Use StringBuilder#appendCodePoint
to build up the characters of text. You can use the following stream-based code as boilerplate. For explanation, see this Question.
String result =
Arrays
.stream( replacedCodePoints )
.collect( StringBuilder::new , StringBuilder::appendCodePoint , StringBuilder::append )
.toString();