Java split by alphabeta char creates an empty value in array

Question

I want to split my string on every occurrence of an alpha-beta character.

for example:

"s1l1e13" to an array of: ["s1","l1","e13"]

when trying to use this simple split by regex i get some weird results:

testStr = "s1l1e13"
Arrays.toString(testStr.split("(?=[a-z])"))

gives me the array of:

["","s1","l1","e13"]

how can i create the split without the empty array element?

I tried a couple more things:

testStr = "s1"
Arrays.toString(testStr.split("(?=[a-z])"))

does return the currect array: ["s1"]

but when trying to use substring

testStr = "s1l1e13"
Arrays.toString(testStr.substring(1).split("(?=[a-z])")

i get in return ["1","l1","e13"]

what am i missing?

I'd use google Guava, its more readable and it has a lot of usefull classes that are handy. "Splitter.on('.').omitEmptyStrings().split("how.are.you?");" You'll get more readable code and wont mess with regular expressions. — vach, Jun 02 '14 at 14:49

hwnd · Accepted Answer · 2014-06-02T14:45:06.010

4

Your Lookahead marks each position before any character of a to z; marking the following positions:

 s1 l1 e13
^  ^  ^

So by spliting using just the Lookahead, it returns ["", "s1", "l1", "e13"]

You can use a Negative Lookbehind here. This looks behind to see if there is not the beginning of the string.

String s = "s1l1e13";
String[] parts = s.split("(?<!\\A)(?=[a-z])");
System.out.println(Arrays.toString(parts)); //=> [s1, l1, e13]

edited Jun 02 '14 at 14:45

answered Jun 02 '14 at 14:15

hwnd

69,796
4
95
132

plenty of working answers but you are the fastest gun, works! thanks! – amitben Jun 02 '14 at 14:39

score 2 · Answer 2 · edited May 23 '17 at 12:20

Your problem is that (?=[a-z]) means "place before [a-z]" and in your text

s1l1e13

you have 3 such places. I will mark them with |

|s1|l1|e13

so split (unfortunately correctly) produces "" "s1" "l1" "e13" and doesn't automatically remove for you first empty elements.

To solve this problem you have at least two options:

make sure that there is something before your place you need to split on (it is not at start of your string). You can use for instance (?<=\\d)(?=[a-z]) if you want to split after digit but before character
(PREFFERED SOLUTION) start using Java 8 which automatically removes empty strings at start of result array if regex used on split is zero-length (look-arounds are zero length).

thank you very much @Pshemo for your fully detailed answer, this explains a lot! — amitben, Jun 02 '14 at 14:41

score 0 · Answer 3 · answered Jun 02 '14 at 14:16

The first match finds "" to be okay because its looking ahead for any alpha character, which is called zero-width lookahead, so it doesn't need to actually match anything. So "s" at the beginning is alphanumeric, and it matches that at a probable spot.

If you want the regex to match something always, use ".+(?=[a-z])"

score 0 · Answer 4 · answered Jun 02 '14 at 14:17

The problem is that the initial "s" counts as an alphabetic character. So, the regex is trying to split at s.

The issue is that there is nothing before the s, so the regex machine instead decides to show that there is nothing by adding the null element. It'll do the same thing at the end if you ended with "s" (or any other letter).

If this is the only string you're splitting, or if every array you had starts with a letter but does not end with one, just truncate the array to omit the first element. Otherwise, you'll probably need to loop through each array as you make it so that you can drop empty elements.

score 0 · Answer 5 · answered Jun 02 '14 at 14:19

0

So it seems your matches has the pattern x###, where x is a letter, and # is a number.

I'd make the following Regex:

([a-z][0-9]+)

answered Jun 02 '14 at 14:19

Matias Cicero

25,439
13
82
154

Java split by alphabeta char creates an empty value in array

5 Answers5