4

I am splitting a string by a repeatable delimiter, and am also intended to keep the delimiters as well.

val str = "xxoooooooxxoxoxooooo"
val reg = Regex("(?<=x+)|(?=x+)")
var list = str.split(reg)
println(list) 

The output is [, x, x, ooooooo, x, x, o, x, o, x, ooooo], though I would like to get

[xx, ooooooo, xx, o, x, o, x, ooooo]

Mark
  • 5,994
  • 5
  • 42
  • 55
  • 1
    why don't you just match all [`x+|o+`](https://www.regexplanet.com/share/index.html?share=yyyyd3mtaar) – bobble bubble Nov 19 '20 at 19:24
  • 1
    @bobblebubble One reason is that if the input string contains not only `x` and `o`, it would not work. However, the string is the question only contains `x` and `o`, so the pattern will work for this string. – Wiktor Stribiżew Nov 19 '20 at 19:28
  • @WiktorStribiżew I'm commenting on the sample string and the question, anything else is just your guess. If this is the task, I think using lookarounds is overcomplicated. – bobble bubble Nov 19 '20 at 19:32

2 Answers2

5
val str = "xxoooooooxxoxoxooooo"
val reg =  Regex("o+|x+").findAll(str).map { it.value }.toList()
println(reg)
//[xx, ooooooo, xx, o, x, o, x, ooooo]
user2424380
  • 1,393
  • 3
  • 16
  • 29
4

Actually, it is not correct to use + quantifier inside a lookbehind in Java's regex patterns, this is not a documented supported feature. It does not throw exception because internally it is translated into {1,0x7FFFFFFF} and Java's regex supports constrained-width lookbehind patterns. However, this quantifier in both the lookbehind and lookahead makes no difference as these are non-consuming patterns, and the regex engine still checks each position inside a string for a pattern match.

You can use

(?<=x)(?=o)|(?<=o)(?=x)

See a Kotlin demo:

val str = "xxoooooooxxoxoxooooo"
val reg = Regex("(?<=x)(?=o)|(?<=o)(?=x)")
var list = str.split(reg)
println(list) 
// => [xx, ooooooo, xx, o, x, o, x, ooooo]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563