-1

I am trying to match two string in Java which have an invalid whitespace or a funny character in play. I have tried couple of things and have captured my attempts in the code however, below does not work i.e. tester complains unmatched.

package org.example;

import java.lang.reflect.Array;
import java.util.Arrays;

public class Tester {

    public static void main(String[] args){
        String actual = "*** Transaction with whitespace character ****** for replacement ***";
        String expected = "*** Transaction with whitespace character ****** for replacement ***";

        actual = actual.replaceAll("\uFFFD", "");

        String re = "[^\\u0009\\u000A\\u000D\\u0020-\\uD7FF\\uE000-\\uFFFD\\u0001\\u0000-\\u0010\\uFFFF]";
        actual.replaceAll(re, "");

        String re1 = "[^\\x09\\x0A\\x0D\\x20-\\xD7FF\\xE000-\\xFFFD\\x10000-x10FFFF]";
        actual.replaceAll(re1, " ");

        actual.replaceAll("\u001E", "");
        //actual.replaceAll("\uFFFD", "\"");

        String re2 = "[^\u0009\r\n\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF]";
        actual.replaceAll(re2, "");

        actual = translate(actual);

        if(actual.equals(expected))
            System.out.println("Matched !! ");
        else
            System.out.println("Unmatched !!!");


    }

    static String translate(String input)
    {

        // Creating array of string length
        char[] output = new char[input.length()];

        for (int i = 0; i < input.length(); i++)
        {
            char tch = input.charAt(i);
            if (tch == 65533) // char �
                tch = (char)233; // é
            else
                output[i] = input.charAt(i);
        }
        return Arrays.toString(output);
    }
}
Pshemo
  • 122,468
  • 25
  • 185
  • 269
JavaMan
  • 465
  • 1
  • 6
  • 21

2 Answers2

4

actual.replaceAll(re, "");

This doesn't do anything whatsoever. replaceAll does not change the thing you call it on; it makes a new string with the replacement applied.

Try actual = actual.replaceAll(re, ""); instead.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
1

From source one can read:

The String class represents character strings. All string literals in Java programs, such as "abc", are implemented as instances of this class. Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared.

In Java Strings are Immutable, therefore any method that "modifies a String" is actually always creating a new String with the desired modification applied on it.

And the same applies to the method replaceAll. For curiosity if you look at the replaceAll implementation you can see the following:

   public String replaceAll(String replacement) {
        reset();
        boolean result = find();
        if (result) { <-- match found
            StringBuilder sb = new StringBuilder();
            do {
                appendReplacement(sb, replacement);
                result = find();
            } while (result);
            appendTail(sb);
            return sb.toString(); // <--- new string
        }
        return text.toString(); 
    }

when the match is found a new string will be returned at the end "sb.toString();" resulting from:

replacing each matching subsequence by the replacement string, substituting captured subsequences as needed

Therefore you need to change from

actual.replaceAll(re, ""); 

to

actual = actual.replaceAll(re, "");.

The code with the fixes:

package org.example;

import java.lang.reflect.Array;
import java.util.Arrays;

public class Tester {

    public static void main(String[] args){
        String actual = "*** Transaction with whitespace character ****** for replacement ***";
        String expected = "*** Transaction with whitespace character ****** for replacement ***";

        actual = actual.replaceAll("\uFFFD", "");

        String re = "[^\\u0009\\u000A\\u000D\\u0020-\\uD7FF\\uE000-\\uFFFD\\u0001\\u0000-\\u0010\\uFFFF]";
        actual = actual.replaceAll(re, "");

        String re1 = "[^\\x09\\x0A\\x0D\\x20-\\xD7FF\\xE000-\\xFFFD\\x10000-x10FFFF]";
        actual = actual.replaceAll(re1, " ");

        actual = actual.replaceAll("\u001E", "");
        //actual.replaceAll("\uFFFD", "\"");

        String re2 = "[^\u0009\r\n\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF]";
        actual = actual.replaceAll(re2, "");

        actual = translate(actual);

        if(actual.equals(expected))
            System.out.println("Matched !! ");
        else
            System.out.println("Unmatched !!!");


    }

    static String translate(String input)
    {
        // Creating array of string length
        char[] output = new char[input.length()];

        for (int i = 0; i < input.length(); i++)
        {
            char tch = input.charAt(i);
            if (tch == 65533) // char �
                tch = (char)233; // é
            else
                output[i] = input.charAt(i);
        }
        return Arrays.toString(output);
    }
}
dreamcrash
  • 47,137
  • 25
  • 94
  • 117