0

I have a XML which has some non printable ascii characters like ¢ìÂíÄ . When I try to remove it using replaceAll("([^\p{ASCII}])","") I'm getting result as &#199 ;&#233 ; something like this for the non printable ascii characters. But I need to remove these characters completely.

Please anyone guide me on this Thanks in advance.

  • The "non-printable" ASCII characters are printable since you printed them in your question. Perhaps you mean non-English ASCII characters. Define the set of "printable" ASCII characters and replace any other character with a missing character. Yes, that means writing code and not relying on complex regular expressions. – Gilbert Le Blanc Dec 20 '22 at 12:32
  • Hi Thanks for your response.. You mean read each character and replace with any other character ? – Bala murugan Dec 20 '22 at 12:35
  • Does this answer your question? [How can I replace non-printable Unicode characters in Java?](https://stackoverflow.com/questions/6198986/how-can-i-replace-non-printable-unicode-characters-in-java) – Tasos P. Dec 20 '22 at 12:37
  • Yes, read an input `String`, check each character against your array of "printable" characters, and pass along only the "approved" characters to an output `String`. – Gilbert Le Blanc Dec 20 '22 at 12:40
  • 1
    This seems odd, almost like an [xy problem](https://xyproblem.info/). Why would you want to throw away data, especially without regards for context, i.e., in which XML element it appears? That said, if you get the non-ascii characters as xml entities, you could try a regex for that, like `"\d{3};"`. A [mcve] would be helpful. – Robert Dec 20 '22 at 15:13

1 Answers1

1

Use the replaceAll method of the String class to replace all non-printable ASCII characters with an empty string. Non-printable ASCII characters are those with ASCII code values below 32, except for 9 (tab), 10 (newline), and 13 (carriage return).

[^\x09\x0A\x0D\x20-\x7E]

try adding this regex...