3

How to split a string based on each upper case letter it contains. Couldn't find any help from internet.

a<-"MiXeD"
b<-"ServiceEventId"

I would like to get

a<-c("Mi", "Xe", "D")
b<-c("Service", "Event", "Id")
Pierre L
  • 28,203
  • 6
  • 47
  • 69
JeanVuda
  • 1,738
  • 14
  • 29
  • See here for some options (comments on second answer specifically): http://stackoverflow.com/questions/7988959/splitting-string-based-on-letters-case - strange how that was the **first** result on Google searching for `R Split a string based on upper case letters`, which is essentially your question title. Hmmmm? – thelatemail Jan 14 '16 at 00:13
  • See: http://stackoverflow.com/questions/22528625/how-to-convert-camelcase-to-not-camel-case-in-r – G. Grothendieck Jan 14 '16 at 00:23
  • @thelatemail Probably worth emphasizing (even though I know *you* know it) that neither of the answers there does what the OP here is asking for. (As you hint, my comment -- the third one below Ben Bolker's answer -- does.) – Josh O'Brien Jan 14 '16 at 00:42
  • @thelatemail, You are referencing a different SO post than the one I referenced. – G. Grothendieck Jan 14 '16 at 00:46

2 Answers2

2

Here's one option, which uses one lookbehind and one lookahead assertion to find (and then split at) intercharacter spaces that are immediately followed by an uppercase letter. To learn why both a lookahead and a lookbehind assertion are needed (i.e. not just a lookahead assertion) see this question and its answers.

f <- function(x) {
    strsplit(x, "(?<=.)(?=[[:upper:]])", perl=TRUE)
}

f(a)
# [[1]]
# [1] "Mi" "Xe" "D" 

f(b)
# [[1]]
# [1] "Service" "Event"   "Id"  
Community
  • 1
  • 1
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 1
    Just for giggles, here's a `regmatches` adaption - `regmatches(d,gregexpr("([[:upper:]]|^)([^[:upper:]]+|$)",d))` – thelatemail Jan 14 '16 at 00:19
2

Use str_extract_all from the stringr package:

library(stringr)
str_extract_all(x, "[A-Z][a-z]*")

or

str_extract_all(x, "[A-Z][a-z]*|[a-z]+")
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274