3

I want to split my string on every occurrence of an alpha-beta character.

for example:

"s1l1e13" to an array of: ["s1","l1","e13"]

when trying to use this simple split by regex i get some weird results:

testStr = "s1l1e13"
Arrays.toString(testStr.split("(?=[a-z])"))

gives me the array of:

["","s1","l1","e13"]

how can i create the split without the empty array element?

I tried a couple more things:

testStr = "s1"
Arrays.toString(testStr.split("(?=[a-z])")) 

does return the currect array: ["s1"]

but when trying to use substring

testStr = "s1l1e13"
Arrays.toString(testStr.substring(1).split("(?=[a-z])")

i get in return ["1","l1","e13"]

what am i missing?

Community
  • 1
  • 1
amitben
  • 670
  • 10
  • 21
  • I'd use google Guava, its more readable and it has a lot of usefull classes that are handy. "Splitter.on('.').omitEmptyStrings().split("how.are.you?");" You'll get more readable code and wont mess with regular expressions. – vach Jun 02 '14 at 14:49

5 Answers5

4

Your Lookahead marks each position before any character of a to z; marking the following positions:

 s1 l1 e13
^  ^  ^

So by spliting using just the Lookahead, it returns ["", "s1", "l1", "e13"]

You can use a Negative Lookbehind here. This looks behind to see if there is not the beginning of the string.

String s = "s1l1e13";
String[] parts = s.split("(?<!\\A)(?=[a-z])");
System.out.println(Arrays.toString(parts)); //=> [s1, l1, e13]
hwnd
  • 69,796
  • 4
  • 95
  • 132
2

Your problem is that (?=[a-z]) means "place before [a-z]" and in your text

s1l1e13

you have 3 such places. I will mark them with |

|s1|l1|e13

so split (unfortunately correctly) produces "" "s1" "l1" "e13" and doesn't automatically remove for you first empty elements.

To solve this problem you have at least two options:

Community
  • 1
  • 1
Pshemo
  • 122,468
  • 25
  • 185
  • 269
0

The first match finds "" to be okay because its looking ahead for any alpha character, which is called zero-width lookahead, so it doesn't need to actually match anything. So "s" at the beginning is alphanumeric, and it matches that at a probable spot.

If you want the regex to match something always, use ".+(?=[a-z])"

DirkyJerky
  • 1,130
  • 5
  • 10
0

The problem is that the initial "s" counts as an alphabetic character. So, the regex is trying to split at s.

The issue is that there is nothing before the s, so the regex machine instead decides to show that there is nothing by adding the null element. It'll do the same thing at the end if you ended with "s" (or any other letter).

If this is the only string you're splitting, or if every array you had starts with a letter but does not end with one, just truncate the array to omit the first element. Otherwise, you'll probably need to loop through each array as you make it so that you can drop empty elements.

0

So it seems your matches has the pattern x###, where x is a letter, and # is a number.

I'd make the following Regex:

([a-z][0-9]+)
Matias Cicero
  • 25,439
  • 13
  • 82
  • 154