0

I tried to read about regex and escaping, but no luck.

I have a string that looks like this:

String s = "4/18/2015|Planned|Linux|Maintenance";

And I want to split it with the delimiter '|' :

String[] tokens = s.split("|");

The correct results I am expecting which are

tokens[0] is "4/18/2015", 
tokens[1] is "Planned", 
tokens[2] is "Linux", 
token[3] is "Maintenance", 

yet it's giving me some weird result like this:

tokens[0] is null
tokens[1] is 4
tokens[2] is /
and tokens[3] is 1

I am guessing it's because of the slashes '/' in the date that's why. I tried to search for many existing questions and tried the suggested methods as well but to no avail.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
hiew1
  • 1,394
  • 2
  • 15
  • 23

3 Answers3

4

@mushfek0001 got it right.

The pipe in most regex dialects is a metacharacter for the alternation; basically what you ask the regex engine to do here is: "split against the empty string or... the empty string".

And, uh, it means you would potentially get empty each time, except that the regex engine is not a fool, and if an empty match is detected in a split the engine will advance one character before splitting again... Hence your result (not sure why the first element is null and not the empty string, though).

Therefore, you should split against "\\|", not "|".

What is more, if you do this repeatedly, use a Pattern instead:

private static final Pattern PIPE = Pattern.compile("\\|");

// ...

final String[] tokens = PIPE.split(yourInput);
Community
  • 1
  • 1
fge
  • 119,121
  • 33
  • 254
  • 329
  • That's some nice info der!!!!!!!!! – vks Apr 02 '15 at 04:59
  • Thanks fge! It helps a lot! I should have accepted yours yet mushfek0001 came to solve my problem earlier. Thank you so much fge I upvoted your answer. – hiew1 Apr 02 '15 at 05:12
3

Just use

split("\\x7C")

or

split("\\|")

You need to escape or use corresponding unicode value when splitting against the pipeline char '|'.

mushfek0001
  • 3,845
  • 1
  • 21
  • 20
  • I tried your previous suggestion split("\\|") and it worked. I need to accept your answer but it's not letting me to do so it has a 10 minutes limit. I will accept your answer later. – hiew1 Apr 02 '15 at 04:55
  • What you call the "hex code" here is the unicode code point; note that this is only available since Java 7! – fge Apr 02 '15 at 05:05
  • Thanks for the info man. Made the edit. – mushfek0001 Apr 02 '15 at 05:09
2

escape the pipe character:

s.split("\\|");

because pipe sign in regex means OR, so to escape it you need \| but in regex you need to escape \ too so \\| will work.

or as mushfek0001 suggested:

split("\\x7C")
Zaheer Ahmed
  • 28,160
  • 11
  • 74
  • 110