0

A string is comprised of the following:

  • An optional sequence of ASCII digits.
  • A sequence of ASCII lowercase letters.

I'm trying to do the split in one single regex that I could use like this:

String string = "123abc";
var array = string.split(...);
System.out.println(java.util.Arrays.toString(array));
// prints [123, abc]

The closest regex I've come to is the following:

(?<=\d+)

Example:

String string = "123abc";
var array = string.split("(?<=\\d+)");
System.out.println(java.util.Arrays.toString(array));
// prints [1, 2, 3, abc]

Technically, I could do this without any regex, but here, it's important to be done with regex.

A solution to prove I can do it normally:

String string = "123abc";
int i = 0;
for(; i < string.length() && Character.isDigit(string.charAt(i)); i++)
  ;
String[] array = {
    string.substring(0, i), string.substring(i)
  };
System.out.println(java.util.Arrays.toString(array));
// prints [123, abc]

Another way of doing it would be:

String string = "123abc";
String[] array = {
    string.replaceAll("\\D", ""),
    string.replaceAll("\\d", "")
  };
System.out.println(java.util.Arrays.toString(array));
// prints [123, abc]

Matching examples:

In:                 Out:
123abc              [ "123", "abc" ]
0a                  [ "0", "a" ]
a                   [ "", "a" ]
Olivier Grégoire
  • 33,839
  • 23
  • 96
  • 137
  • @WiktorStribiżew Thanks but it doesn't work for `"abc".split(...)` which returns `[ "abc" ]`, not `[ "", "abc" ]`. – Olivier Grégoire Dec 20 '18 at 19:36
  • Then match with `"(\\d*)(\\D+)"` or `"(\\d*)(\\D*)"` and get Group 1 and 2 – Wiktor Stribiżew Dec 20 '18 at 19:38
  • Is it also important to use `.split()`? If I wanted to use regex for this, I'd use matching groups instead: `string.matches("(\d+)([a-z]+)")` – Geoffrey Wiseman Dec 20 '18 at 19:39
  • @GeoffreyWiseman Yes, it's important: I'm actually code-golfing something and split is usually the best way to win huge amounts of bytes. I've already golfed everything else but that part, and even with a 20-characters regex, I'd still gain bytes in the end. So yes, there should only be a `split`. As shown in the solutions that work without regex or without split, I have no issue to make it work without split. I'm only interested in a `split` solution. – Olivier Grégoire Dec 20 '18 at 19:43
  • Try `split("(?<=\\d|^)(?=[a-z])")` or `split("(?<!\\D)(?=[a-z])")` – Wiktor Stribiżew Dec 20 '18 at 19:45
  • 1
    [Pattern.split()](https://docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html#split(java.lang.CharSequence,int)) documentation says `When there is a positive-width match at the beginning of the input sequence then an empty leading substring is included at the beginning of the resulting array. A zero-width match at the beginning however never produces such empty leading substring.`. – Venkata Raju Dec 20 '18 at 21:50
  • @VenkataRaju You can answer that: saying "what you want is not possible" is a valid answer! ;) – Olivier Grégoire Dec 20 '18 at 21:52

1 Answers1

1

Pattern.split() documentation says:

When there is a positive-width match at the beginning of the input sequence then an empty leading substring is included at the beginning of the resulting array. A zero-width match at the beginning however never produces such empty leading substring.

So what you are trying to achieve may not be possible with Regular Expression.

Venkata Raju
  • 4,866
  • 3
  • 27
  • 36