3
System.out.println(
    Arrays.deepToString(
        "abc<def>ghi".split("(?:<)|(?:>)")
    )
);

This prints [abc, def, ghi], as if I had split on "<|>". I want it to print [abc, <def>, ghi]. Is there a way to work some regex magic to accomplish what I want here?


Perhaps a simpler example:

System.out.println(
    Arrays.deepToString(
        "Hello! Oh my!! Good bye!!".split("(?:!+)")
    )
);

This prints [Hello, Oh my, Good bye]. I want it to print [Hello!, Oh my!!, Good bye!!]. `.

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • Duplicate of http://stackoverflow.com/questions/275768/is-there-a-way-to-split-strings-with-string-split-and-include-the-delimiters – danben Mar 09 '10 at 04:25

3 Answers3

3

You need to take a look at zero width matching constructs:

(?=X)   X, via zero-width positive lookahead
(?!X)   X, via zero-width negative lookahead
(?<=X)  X, via zero-width positive lookbehind
(?<!X)  X, via zero-width negative lookbehind
Cine
  • 4,255
  • 26
  • 46
1

You can use \b (word boundary) as what to look for as it is zero-width and use that as the anchor for looking for < and >.

String s = "abc<def>ghi";
String[] bits = s.split("(?<=>)\\b|\\b(?=<)");
for (String bit : bits) {
  System.out.println(bit);
}

Output:

abc
<def>
ghi

Now that isn't a general solution. You will probably need to write a custom split method for that.

Your second example suggests it's not really split() you're after but a regex matching loop. For example:

String s = "Hello! Oh my!! Good bye!!";
Pattern p = Pattern.compile("(.*?!+)\\s*");
Matcher m = p.matcher(s);
while (m.find()) {
  System.out.println("[" + m.group(1) + "]");
}

Output:

[Hello!]
[Oh my!!]
[Good bye!!]
cletus
  • 616,129
  • 168
  • 910
  • 942
0

Thanks to information from Cine, I think these are the answers I'm looking for:

System.out.println(
    Arrays.deepToString(
        "abc<def>ghi<x><x>".split("(?=<)|(?<=>)")
    )
); // [abc, <def>, ghi, <x>, <x>]


System.out.println(
    Arrays.deepToString(
        "Hello! Oh my!! Good bye!! IT WORKS!!!".split("(?<=!++)")
    )
); // [Hello!,  Oh my!!,  Good bye!!,  IT WORKS!!!]

Now, the second one was honestly discovered by experimenting with all the different quantifiers. Neither greedy nor reluctant work, but possessive does.

I'm still not sure why.

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • 2
    Your second example isn't supposed to work. :-/ It should throw a PatternSyntaxException because the lookbehind has no obvious maximum length. That your regex compiles is a bug; that it *works* is mind boggling--and not to be relied on. Here's what you should be using: `(?<=!)(?!!)`. That will work in any regex flavor that supports lookaheads and lookbehinds. – Alan Moore Mar 09 '10 at 06:50
  • 2
    The bug has been reported, if that's what you mean. I would advise you not to get into the habit of using variable-width expressions in lookbehinds in any case; very few regex flavors support that capability, and there's usually a better way anyway. – Alan Moore Mar 09 '10 at 07:18