Split a string based on upper case letters

Question

How to split a string based on each upper case letter it contains. Couldn't find any help from internet.

a<-"MiXeD"
b<-"ServiceEventId"

I would like to get

a<-c("Mi", "Xe", "D")
b<-c("Service", "Event", "Id")

See here for some options (comments on second answer specifically): http://stackoverflow.com/questions/7988959/splitting-string-based-on-letters-case - strange how that was the **first** result on Google searching for `R Split a string based on upper case letters`, which is essentially your question title. Hmmmm? — thelatemail, Jan 14 '16 at 00:13
See: http://stackoverflow.com/questions/22528625/how-to-convert-camelcase-to-not-camel-case-in-r — G. Grothendieck, Jan 14 '16 at 00:23
@thelatemail Probably worth emphasizing (even though I know *you* know it) that neither of the answers there does what the OP here is asking for. (As you hint, my comment -- the third one below Ben Bolker's answer -- does.) — Josh O'Brien, Jan 14 '16 at 00:42
@thelatemail, You are referencing a different SO post than the one I referenced. — G. Grothendieck, Jan 14 '16 at 00:46

score 2 · Answer 1 · edited May 23 '17 at 12:23

2

Here's one option, which uses one lookbehind and one lookahead assertion to find (and then split at) intercharacter spaces that are immediately followed by an uppercase letter. To learn why both a lookahead and a lookbehind assertion are needed (i.e. not just a lookahead assertion) see this question and its answers.

f <- function(x) {
    strsplit(x, "(?<=.)(?=[[:upper:]])", perl=TRUE)
}

f(a)
# [[1]]
# [1] "Mi" "Xe" "D" 

f(b)
# [[1]]
# [1] "Service" "Event"   "Id"

edited May 23 '17 at 12:23

Community

1
1

answered Jan 14 '16 at 00:16

Josh O'Brien

159,210
26
366
455

1

Just for giggles, here's a `regmatches` adaption - `regmatches(d,gregexpr("([[:upper:]]|^)([^[:upper:]]+|$)",d))` – thelatemail Jan 14 '16 at 00:19

score 2 · Answer 2 · edited Jan 14 '16 at 01:24

2

Use str_extract_all from the stringr package:

library(stringr)
str_extract_all(x, "[A-Z][a-z]*")

or

str_extract_all(x, "[A-Z][a-z]*|[a-z]+")

edited Jan 14 '16 at 01:24

Josh O'Brien

159,210
26
366
455

answered Jan 14 '16 at 00:27

Avinash Raj

172,303
28
230
274

I think this falls over if the string starts with lower case or something else - e.g. `"thisIsMixed"` – thelatemail Jan 14 '16 at 00:32
`str_extract_all(x, "[A-Z][a-z]*|[a-z]+")` – Avinash Raj Jan 14 '16 at 00:33

Split a string based on upper case letters

2 Answers2