-1

I'm using this little snippet.

string.replaceAll("[^\\p{ASCII}]","")

I want to delete or remove the nonAsciiCharacters but i have a problem for example the following string is getting rip

final String myString = "cada dia es más cercano a Dios.";

but the á is getting remove and this is the 225 Ascii character i thought that this regex will replace all the NON-ASCII but á is ascii character why is this?

Maybe i get it all wrong.

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111
chiperortiz
  • 4,751
  • 9
  • 45
  • 79

1 Answers1

0

á (a-acute) is not part of ASCII character set. It's a Unicode Character 'LATIN SMALL LETTER A WITH ACUTE' (U+00E1) character and part of the Latin-1 Supplement UTF-8 block.

You can see it by running:

"á".codePoints()
   .mapToObj(Integer::toHexString)
   .forEach(System.out::println); // e1

To keep á you can either specifically white-list this character in the pattern

string.replaceAll("[^\\p{ASCII}á]", "")

or white-list a larger group e.g. p{L} which contains all letters

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111