10

I want to split the following string

"ATextIWantToDisplayWithSpaces"

like this

A Text I Want To Display With Spaces.

I tried this code in R

strsplit(x="ATextIWantToDisplayWithSpaces", split=[:upper:])

which produces this error

Error: unexpected '[' in "strsplit(x="ATextIWantToDisplayWithSpaces", split=["

Any help will be highly appreciated. Thanks

MYaseen208
  • 22,666
  • 37
  • 165
  • 309

4 Answers4

34

Just do this. It works by (a) locating an upper case letter, (b) capturing it in a group and (c) replacing it with the same with a space preceding it.

gsub('([[:upper:]])', ' \\1', x)
Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • 6
    excellent! Here is a slight modification by removing the first white space: `gsub("(?!^)(?=[[:upper:]])", " ", x, perl=T)` – kohske Nov 03 '11 at 00:46
  • 2
    `gsub('([[:upper:]])', ' \\1', x)` put extra space in the beginning if the first letter is in upper case but `gsub("(?!^)(?=[[:upper:]])", " ", x, perl=T)` don't. – MYaseen208 Nov 03 '11 at 00:57
  • 3
    or this which also does not insert a space at the beginning: `gsub("(.)([[:upper:]])", "\\1 \\2", x)` – G. Grothendieck Nov 03 '11 at 11:53
  • 1
    This also prevents the first white space without requiring the `perl` flag by matching not a boundary: `gsub('\\B([[:upper:]])', ' \\1', x)` – manotheshark Apr 17 '20 at 14:57
9

An answer to your specific question ("how do I split on uppercase letters"?) is

strsplit(x="ATextIWantToDisplayWithSpaces", split="[[:upper:]]")

but @Ramnath's answer is what you actually want. strsplit throws away the characters on which it splits. The splitByPattern function from R.utils is closer, but it still won't return the results in the most convenient form for you.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Doh! You got me this time. :) – Roman Luštrik Nov 03 '11 at 00:47
  • @BenBolker: Thanks for your reply. Comment by kohske is more flexible. – MYaseen208 Nov 03 '11 at 00:53
  • 1
    This answers the specific question: `strsplit(x, "(?!^)(?=[[:upper:]])", perl=T)` . (I would like to know, though, why this doesn't quite work: `strsplit(x, "(?=[[:upper:]])", perl=T)`. In particular, why does it fail in the way that it does?) @kohske -- as a local regexp guru, do you have any insight? – Josh O'Brien Nov 03 '11 at 17:33
1

I know this is an old one, but I adapted the solution above to one I had where I needed to split the values of a column in a data frame by upper case and then only keep the second element. This solution uses dplyr and purrr:

df %>% mutate(stringvar= map(strsplit(stringvar, "(?!^)(?=[[:upper:]])", perl=T),~.x[2]) %>% unlist())
RTutt
  • 11
  • 1
1

Using stringr

library(stringr)

str_replace_all(
  string =  "ATextIWantToDisplayWithSpaces",
  pattern = "([[:upper:]])",
  replacement = " \\1"
) %>% 
  str_trim()

#[1] "A Text I Want To Display With Spaces"
jpdugo17
  • 6,816
  • 2
  • 11
  • 23