2

I was trying to split a string into an array of characters string, the problem is that .split() returns an empty element also. ("test").split would return ["","t","e","s","t"].

The solution in this question Split string into array of character strings solves the problem ( using .split("(?!^)") ).

However I still cannot understand why this works, and i'm not going to use a piece of code which i cannot understand just because it gets the job done.

I've read these two pages http://www.regular-expressions.info/lookaround.html and http://ocpsoft.org/opensource/guide-to-regular-expressions-in-java-part-2/ about negative look-ahead and still cannot understand. Can someone clarify this?

Community
  • 1
  • 1
user1493813
  • 981
  • 1
  • 8
  • 16

3 Answers3

5

using ("test").split() will split the string at EVERY position before a character, resulting in ["", "t", "e", "s", "t"], because the first split (in front of t) will cause an empty entry.

This regex ("(?!^)") does mean: Split the string at every Character, where NOT the line-start (^) is the previous character*:

Your string basically looks (for the Regex Engine) like this: ^test$ So, the regex will perform EVERY split, except the split before the first t, because there it matches the ^ - and it should NOT split, when the char in front of the current position is the ^ (String / Line-Start).

*actualley the ^ is not a character, thats why you dont have another split before the $ - they are just meta-characters - so to say.

dognose
  • 20,360
  • 9
  • 61
  • 107
2

You need to first understand why returned array contains an empty first element. When you split a string on a delimiter that occurs at index 0, it will also split on that delimiter. Now the left side of the delimiter is an empty string, which is what gets stored at index 0 of the array.

So, the following code, will give the first array element as empty string:

"#ab#c".split("#");  // ["", "ab", "c"]

However, if # was not the first character of the string, you wouldn't have got the empty string at index 0.

Now, if you don't want the empty string as first element, you just need to avoid splitting on first #. How would you do that? Just ensure that # you are splitting on is not at the beginning of the string - ^, by using negative look-behind:

"#ab#c".split("(?<!^)#");  // ["ab", "c"]

This regex splits on # when it is not preceded by the beginning of the string (?<!^). ^ denote the beginning of the string, and (?<!...) denote negative look-behind.


So, now your delimiter is an empty string itself. Remember, a string contains an empty string before every character, and after the last character too. So, simply splitting on empty string, will split on the delimiter which is before the first character. You rather need to split on empty string, except the one at the beginning. Replacing # with empty string:

"abc".split("(?<!^)");  // ["a", "b", "c"]

Similarly the negative look-ahead works - (?!^), but IMO, the negative look-behind is more intuitive here.


Of course, if you just want to break the string into a character array, you can just use String#toCharArray() method.

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
1

Hm, probably I didn't understand your question but why not to use toCharArray() method?

olyv
  • 3,699
  • 5
  • 37
  • 67