1

I read from a third system an ISO-8859-1 coded String. I have to split this String with the character ¦. This char has in ISO-8859-1 the value 166. The following code doesn't work, because the value in Java (UTF-8) of ¦ is 65533.

String [] parts = isoString.split("¦");

I am stuck... How can I solve this? Thanks

Fredo
  • 320
  • 3
  • 15
  • 2
    According to https://en.wikipedia.org/wiki/Vertical_bar#Solid_vertical_bar_vs_broken_bar, this character, in unicode, is U+00A6 (which, in decimal, is 166). So you could simply use `.split("\u00a6")` – JB Nizet Apr 02 '16 at 11:41
  • There's no such thing as "an ISO-8859-1 coded String" in Java. So I think your real problem is probably that you're not correctly decoding the string from whatever source you get it from. But we can't tell that, because you only show one line of code, without context. – kdgregory Apr 02 '16 at 11:46
  • Although it _is_ also possible that you're not compiling your program with the correct encoding, so the string that you're passing to `split()` is not what you think it is. In which case *JB Nizet*'s answer will work (it's also imo the best way to reference non-ASCII characters in any program). – kdgregory Apr 02 '16 at 11:47

2 Answers2

2

Working code:

String s = new String(new byte[] {'a', 'b', (byte) 166, 'c', 'd'}, 
                      StandardCharsets.ISO_8859_1);
String[] split = s.split("\u00a6");
System.out.println("split = " + Arrays.toString(split));
// prints split = [ab, cd]
JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
0

You first need to properly decode your ISO-8859-1 string into a Unicode representation so that you can split it using the Unicode string literal you supplied (|)-- assuming you're compiling your program using Unicode encoding of course.

Community
  • 1
  • 1
errantlinguist
  • 3,658
  • 4
  • 18
  • 41