1

I have String variable with value- f.e.: this is test-str-ing_łóśżćń.

And I would like replace this chars:

, -, ł,ó,ś,ż,ć,ń with those:

_,_,l,o,s,z,c,n.

And I mean here, that if parser will found f.e.: char - (which is second in first list) should be replaced with char that is in the same position/place in second list, which in this example is: _.

The char ó should be replaced with char o.

The char ń should be replaced with char n.

In my case the list of characters to replace is quite long and parsing in loop for each char to replace would not be enought efficient.

I know method replaceAll(). but it only accept one in String and one out String

So I am looking for method, that will allow me to work on arrays/list of Strings instead of single String.

Please give me some help.

masterdany88
  • 5,041
  • 11
  • 58
  • 132
  • I suggest you give a shot at apache commons lang & `StringUtils` http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#replaceChars%28java.lang.String,%20java.lang.String,%20java.lang.String%29 – dotvav Sep 16 '15 at 09:35
  • replaceAll is heavy weight (regex). fast is replace() with few variants. – Jacek Cz Sep 16 '15 at 09:36
  • professional seems implement "codepage" operations with CharsetProvider and family. I saw something for ancient polish pages 852, mazovia and converters. – Jacek Cz Sep 16 '15 at 09:39
  • 2
    Are you trying to do [this](http://stackoverflow.com/q/3322152/335858)? – Sergey Kalinichenko Sep 16 '15 at 09:40
  • Have a look at the question linked by dasblinkenlight. I have the strong feeling that that's what you're after. Add a second call to `replaceAll` and replace spaces, minuses etc. with an underscore. – Thomas Sep 16 '15 at 09:47
  • `replaceAll()` when lighter alternative exist is overkilling – Jacek Cz Sep 16 '15 at 09:49

3 Answers3

4

Use java.text.Normalizer to Decompose accented letters in base letter plus "combining diacritical marks."

String base = Normalizer.normalize(accented, Form.NFKD)
    .replaceAll("\\p{M}", "");

This does a decompose (D) normalization, and then removes Marks.

Some replacements still needed.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Ok. I will give it a try, cause seems to be most elegant proposition. – masterdany88 Sep 16 '15 at 09:54
  • 1
    Tested. It doesn't convert `ł` to `l`, and still need to change `space` to `_` – masterdany88 Sep 16 '15 at 10:00
  • I've added some extra `replaceAll` to achieve my goal: `return Normalizer.normalize(stringToConvert,Form.NFKD).replaceAll("\\p{M}", "").replaceAll(" ", "_").replaceAll("ł", "l");`. Now it works. Thanks. – masterdany88 Sep 16 '15 at 12:33
  • It would suffice to do `.replace("ł", "l")` maybe even `.replace(' ', '_')` without the regex overhead. – Joop Eggen Sep 16 '15 at 13:28
  • 2
    The string replace is faster than regex replace and the char replace is faster than string replace since the new string length is known ahead of time. – RokL Sep 17 '15 at 15:13
1
    char[] out = new char[src.length()];
    for( j ...){
    inputChar = src.charAt(j);
    for (int i = 0; i < convertChars.length; i++) {
       if (inputChar == convertChars[i]) {
         inputChar = toChars[i];
       }
     }
    }
     out[j] = inputChar ;
   }
    out2 = new String(out);

Extracted from bigger code without IDE, not tested. Loop (I hope) don't allocate objects and should not degrade speed.

Jacek Cz
  • 1,872
  • 1
  • 15
  • 22
0

Make a static lookup table:

private static char[] substitutions = new char[65536];
static {
    // Initialize
    for (char c = 0; c < substitutions.length; c++) {
        substitutions[c] = c;
    }
    // Now add mappings.
    substitions['-'] = '_'; // Map source->target character
    ... // Add the rest
}
// LATER IN Code
char[] stringChars = inputString.toCharArray();
for (int i = 0; i < stringChars.length; i++) {
    stringChars[i] = substitutions[stringChars[i]];
}
outputString = new String(stringChars);
RokL
  • 2,663
  • 3
  • 22
  • 26
  • Probably on of the fastest algorithms (what with RAM? maybe isn't problem?) I had used this technique extensively in "C", DOS times, `substitutions ` equivalent had 256B, highly optimizing C compiler does basic operation in 2-3 CPU cycles. – Jacek Cz Sep 16 '15 at 16:10
  • 1
    That array is 128kb which is nothing on today's machines. – RokL Sep 17 '15 at 15:11