replacing all cases of ISO Control characters in a string with "CTRL"

Question

 static String clean(String identifier) {
    String firstString = "";
    for (int i = 0; i < identifier.length(); i++)
        if (Character.isISOControl(identifier.charAt(i))){
            firstString = identifier.replaceAll(identifier.charAt(i), 
                          "CTRL");
         }
            
        return firstString;
}

The logic behind the code above is to replace all instances of ISO Control characters in the string 'identifier' with "CTRL". I'm however faced with this error: "char cannot be converted to java.lang.String"

Can someone help me to solve and improve my code to produce the right output?

`return identifier.replaceAll("\\p{Cc}", "");` is all you need. `Cc` is a [Unicode general category](http://unicode.org/reports/tr44/#General_Category_Values) and `\\p` is [how regular expressions refer to Unicode categories](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html#ucc). — VGR, Aug 01 '22 at 16:15

Mihe · Answer 1 · 2022-08-01T14:23:21.760

1

String#replaceAll expects a String as parameter, but it has to be a regular expression. Use String#replace instead.

EDIT: I haven't seen that you want to replace a character by some string. In that case, you can use this version of String#replace but you need to convert the character to a String, e. g. by using Character.toString.

Update

Example:

String text = "AB\003DE";
text = text.replace(Character.toString('\003'), "CTRL");
System.out.println(text);
// gives: ABCTRLDE

edited Aug 01 '22 at 14:23

answered Aug 01 '22 at 14:03

Mihe

2,270
2
4
14

Thanks for the suggestion! I just tried your suggestion and this the new error format i'm present with: " method java.lang.String.replace(char,char) is not applicable (argument mismatch; java.lang.String cannot be converted to char)" – user16790478 Aug 01 '22 at 14:10
"ABC".replace(Character.toString('A'), "Hello") will result in "HelloBC". – Mihe Aug 01 '22 at 14:17
I've updated my answer with an example. – Mihe Aug 01 '22 at 14:23
Thanks! I perceive your approach to be for some instances of the iso control characters (meaning hard coding the iso control characters in this case), and if that is so, I find it difficult to apply in the context of the challenge I'm solving thus generalizing everything and calling on the isISOControl method to determine whether a character is one before replacing. – user16790478 Aug 01 '22 at 15:28
It's following your approach, you can do something like Character.toString(identifier.charAt(i)), of course. You can also use replaceAll with some regular expression, e. g. "AB\003DE".replaceAll("\\p{Cntrl}", "CTRL") will replace any control character by CTRL in a single step. – Mihe Aug 01 '22 at 15:37
it's however failing a test case like: "@Test public void ctrl() { assertThat(SqueakyClean.clean("my\0\r\u007FId")).isEqualTo("myCTRLCTRLCTRLId"); }" – user16790478 Aug 01 '22 at 16:50
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/246949/discussion-between-user16790478-and-mihe). – user16790478 Aug 01 '22 at 16:51

Basil Bourque · Answer 2 · 2022-08-01T16:41:58.420

Code points, and Control Picture characters

I can add two points:

The char type is essentially broken since Java 2, and legacy since Java 5. Best to use code point integers when working with individual characters.
Unicode defines characters for display as placeholders for control characters. See Control Pictures section of one Wikipedia page, and see another page, Control Pictures.

For example, the NULL character at code point 0 decimal has a matching SYMBOL FOR NULL character at 9,216 decimal: ␀. To see all the Control Picture characters, use this PDF section of the Unicode standard specification.

Get an array of the code point integers representing each of the characters in your string.

int[] codePoints = myString.codePoints().toArray() ;

Loop those code points. Replace those of interest.

Here is some untested code.

int[] replacedCodePoints = new int[ codePoints.length ] ;
int index = 0 ;
for ( int codePoint : codePoints )
{
    if( codePoint >= 0 && codePoint <= 32 ) // 32 is SPACE, so you may want to use 31 depending on your context.
    {
        replacedCodePoints[ index ] = codePoint + 9_216 ;  // 9,216 is the offset to the beginning of the Control Picture character range defined in Unicode.
    } else if ( codePoint == 127 )  // DEL character.
    {
        replacedCodePoints[ index ] = 9_249 ;
    } else  // Any other character, we keep as-is, no replacement.
    {
        replacedCodePoints[ index ] = codePoint ;
    }
    i ++ ;  // Set up the next loop.
}

Convert code points back into text. Use StringBuilder#appendCodePoint to build up the characters of text. You can use the following stream-based code as boilerplate. For explanation, see this Question.

String result = 
    Arrays
        .stream( replacedCodePoints )
        .collect( StringBuilder::new , StringBuilder::appendCodePoint , StringBuilder::append )
        .toString();

replacing all cases of ISO Control characters in a string with "CTRL"

2 Answers2

Code points, and Control Picture characters