5

I want to use strsplit at a pattern before every capital letter and use a positive lookahead. However it also splits after every, and I'm confused about that. Is this regex incompatible with strsplit? Why is that so and what is to change?

strsplit('AaaBbbCcc', '(?=\\p{Lu})', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[A-Z])', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[ABC])', perl=TRUE)[[1]]
# [1] "A"  "aa" "B"  "bb" "C"  "cc"

Expected result:

# [1] "Aaa" "Bbb" "Ccc"

In the Demo it actually looks fine.

Ideally it should split before every camel case, e.g. Aa and not AA; there's \\p{Lt} but this doesn't seem to work at all.

strsplit('AaaABbbBCcc', '(?=\\p{Lt})', perl=TRUE)[[1]]
# [1] "AaaABbbBCcc"

Expected result:

# [1] "AaaA" "BbbB" "Ccc" 
jay.sf
  • 60,139
  • 8
  • 53
  • 110

1 Answers1

3

It seems that by adding (?!^) you can obtained the desired result.

strsplit('AaaBbbCcc', "(?!^)(?=[A-Z])", perl=TRUE)

For the camel case we may do

strsplit('AaaABbbBCcc', '(?!^)(?=\\p{Lu}\\p{Ll})', perl=TRUE)[[1]]
strsplit('AaaABbbBCcc', '(?!^)(?=[A-Z][a-z])', perl=TRUE)[[1]]  ## or
# [1] "AaaA" "BbbB" "Ccc" 
jay.sf
  • 60,139
  • 8
  • 53
  • 110
Giulio Mattolin
  • 620
  • 4
  • 14