5

I'm looking for a regex that will split a string as follows:

String input = "x^(24-3x)";
String[] signs = input.split("regex here");
for (int i = 0; i < signs.length; i++) { System.out.println(sings[i]); }

with the output resulting in:

"x", "^", "(", "24", "-", "3", "x", ")"

The string is split at every character. However, if there are digits next to each other, they should remain grouped in one string.

approxiblue
  • 6,982
  • 16
  • 51
  • 59
Zi1mann
  • 334
  • 1
  • 2
  • 15

2 Answers2

4

You can use this lookaround based regex:

String[] signs = input.split("(?<!^)(?=\\D)|(?<=\\D)");

RegEx Demo

RegEx Breakup

(?<!^)(?=\\D)  # assert if next char is non-digit and we're not at start
|              # regex alternation
(?<=\\D)       # assert if previous character is a non-digit
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Perfectly fine, even with explanation. Thank you. – Zi1mann Dec 05 '15 at 10:57
  • Can you share why the check for not being at start is needed? I tested it with three test cases -> "", "123", "aa123" and in all three I got the same result whether I include that or not. Regex101 is showing a difference for PCRE but for Java I cannot see any difference in results. – Aseem Bansal Dec 06 '15 at 07:54
  • @AseemBansal: Since I tested it on regex101 where it was splitting on start position as well that's why I included `(?<!^)` in this regex. – anubhava Dec 06 '15 at 08:30
  • @anubhava Thanks for sharing. I thought I was missing some condition – Aseem Bansal Dec 06 '15 at 08:41
0

you also can use pattern and matcher to split into tokens, which is rather readable

String regex="\\d+|[a-z]+|[\\-()\\^]";
String  str="x^(24-3x)";

if works also easy with str="xxx^(24-3xyz)";

To get all tokens, it's a little tricky:

I use this:

courtesy of: Create array of regex matches

for (MatchResult match : allMatches(Pattern.compile(regex), str)) {
  System.out.println(match.group() + " at " + match.start());
}

public static Iterable<MatchResult> allMatches(
      final Pattern p, final CharSequence input) {
  return new Iterable<MatchResult>() {
    public Iterator<MatchResult> iterator() {
      return new Iterator<MatchResult>() {
        // Use a matcher internally.
        final Matcher matcher = p.matcher(input);
        // Keep a match around that supports any interleaving of hasNext/next calls.
        MatchResult pending;

        public boolean hasNext() {
          // Lazily fill pending, and avoid calling find() multiple times if the
          // clients call hasNext() repeatedly before sampling via next().
          if (pending == null && matcher.find()) {
            pending = matcher.toMatchResult();
          }
          return pending != null;
        }

        public MatchResult next() {
          // Fill pending if necessary (as when clients call next() without
          // checking hasNext()), throw if not possible.
          if (!hasNext()) { throw new NoSuchElementException(); }
          // Consume pending so next call to hasNext() does a find().
          MatchResult next = pending;
          pending = null;
          return next;
        }

        /** Required to satisfy the interface, but unsupported. */
        public void remove() { throw new UnsupportedOperationException(); }
      };
    }
  };
}
Community
  • 1
  • 1