4

For a current programming project I am doing I need the ability to convert words containing non-ASCII umlauts like 'ä', 'ö' or 'ü' into words/Strings containing Unicode (\u00F6).

To achieve this I wanted to try out the 'new' Java Streams. So far I was able to obtain all indices of characters that would not fit in the default ASCII charset and thus need to be replaced.

public static void replaceUmlauts() {
  char[] chars = "persönlich".toCharArray();
  int[] ind = IntStream.range(0, chars.length).filter(i -> chars[i] > 128).toArray();
}

Yet I do not really find a way to nicely replace the umlauts at the identified indices with their respective Unicode representations. To stay with one paradigm I would like to find a Stream solution, but I would also be open to other efficient solutions to solve the problem.

Also completely different - maybe even easier approaches - to the whole problems would be well appreciated.

Marco N.
  • 185
  • 2
  • 8
  • Java characters are already in Unicode. What exactly are you trying to do? – 4castle Aug 12 '16 at 06:47
  • Well I want to store Java Strings to a `{somename}.properties` file. As I experienced some encoding issues (CP1252 for Windows machines and UTF-8 for other systems) I would like to persist Strings like `persönlich` as `pers\u00F6lich` just to avoid any later on issues. So while working with `persönlich` and similar words works fine 'within Java' so far I strongly believe that for persistance the explicit conversion is acutally useful. – Marco N. Aug 12 '16 at 06:49
  • 1
    @MarcoN. Before you do anything at all, read [this](http://www.joelonsoftware.com/articles/Unicode.html). Don't try to be clever, or you might end up in a situation like [him](http://stackoverflow.com/questions/38890321/recover-wrongly-encoded-character-java/38890501). – Kayaman Aug 12 '16 at 07:00
  • 2
    Rather than use Streams, you should use the solutions in the question I linked. None of the answers seem to contain solutions which are candidates for a Stream. – 4castle Aug 12 '16 at 07:02
  • @Kayaman: Thanks for the link. I am reading through it right now. And I have to admit I was already worried that this approach might be somewhat broken - I guess I just grew paranoid due to some very time consuming issues I suffered lately. – Marco N. Aug 12 '16 at 07:08
  • And @4castle: Thanks! Good to know that I actually do not even need to take care of it. So I have written the initial files manually and I guess I just missed this fact. This is going to save me quite some time. – Marco N. Aug 12 '16 at 07:08
  • @MarcoN.: Here I figured a stream only solution out: `String str = raw.chars().mapToObj(c -> (String)(c > 128 ? String.format("\\u%04x", (int) c) : String.valueOf((char)c))).collect(Collectors.joining());` – Andreas Brunnet Aug 12 '16 at 07:15

1 Answers1

0

a simple solution would be

    String in = "persönlich";
    StringBuilder out = new StringBuilder();
    for (int i = 0; i < in.length(); i++) {
        char ch = in.charAt(i);
        out.append(ch <= 127 ? ch : "\\u" + String.format("%04x", (int)ch));
    }

Or if you want to do it "streamish":

    String text = "persönlich";
    StringBuilder result = new StringBuilder();
    text.chars().forEachOrdered(c -> result.append(c < 128 ? (char) c : String.format("\\u%04X", c)));
    System.out.println(result);
Guenther
  • 2,035
  • 2
  • 15
  • 20