I am normalizing some accented text using the following approach / code taken from this answer
Accent removal:
String accented = "árvíztűrő tükörfúrógép";
String normalized = Normalizer.normalize(accented, Normalizer.Form.NFD);
normalized = normalized.replaceAll("[^\\p{ASCII}]", "");
System.out.println(normalized);
When I run this from with IntelliJ (as part of a unit test), this gives the expected result:
arvizturo tukorfurogep
If I run this from the command line (via gradle), I get:
ArvAztArA tAkArfArAgA
In both cases, I'm using the same PC and Java 1.8.0_151
.
The relevant parts from build.gradle
:
apply plugin: 'java'
apply plugin: 'idea'
sourceCompatibility = 1.8
targetCompatibility = 1.8
dependencies {
testCompile group: 'junit', name: 'junit', version: '4.12'
}
What causes this different behaviour? And how do I ensure I get the expected result everywhere?