1

So i have a textfile that includes non unicode characters. For example

"pr�s-*"
ESt Präs

How do I print them out, but only them. I know this java method for replacing it

String resultString = currentLine.replaceAll("[^\\x00-\\x7F]", "");

I dont want to replace them, I want to find them and print it out.

Filip
  • 73
  • 2
  • 6
  • You can use a `Matcher#find` with your regex. Or use `\P{ASCII}` regex - see http://ideone.com/jezqeN – Wiktor Stribiżew Aug 12 '16 at 07:32
  • 2
    Could you explain a better what you mean by non-Unicode chars? I thought all chars in Java were Unicode? – Ole V.V. Aug 12 '16 at 07:33
  • 1
    @OleV.V. Indeed. Flip, "non-Unicode" is a very strange construction. Although there are some "computer-stored" characters that are not yet in the Unicode character set, the whole concept of Unicode is to be, for all practical purposes, universal. In any case, characters in the Java string datatype are Unicode. Perhaps you mean characters that are also in some other specific character set. In that case, please state what that character set is. Or, do you mean characters that are in Unicode and at least one other character set? If so, please give a list of all character sets under consideration – Tom Blodget Aug 13 '16 at 22:55

1 Answers1

0

You may use a Matcher#find to find and print all these non-ASCII chars with your [^\\x00-\\x7F] regex or \\P{ASCII}:

String s = "pr�s-*";
Pattern pattern = Pattern.compile("\\P{ASCII}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    System.out.println(matcher.group(0)); 
} 

See the Java demo

See Java regex reference:

\p{ASCII} = All ASCII:[\x00-\x7F]

And \P means a reverse class, all chars other than ASCII.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563