42

I would like to split the following string by its periods. I tried strsplit() with "." in the split argument, but did not get the result I want.

s <- "I.want.to.split"
strsplit(s, ".")
[[1]]
 [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""

The output I want is to split s into 4 elements in a list, as follows.

[[1]]
[1] "I"     "want"  "to"    "split"

What should I do?

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
user3022875
  • 8,598
  • 26
  • 103
  • 167

3 Answers3

53

When using a regular expression in the split argument of strsplit(), you've got to escape the . with \\., or use a charclass [.]. Otherwise you use . as its special character meaning, "any single character".

s <- "I.want.to.split"
strsplit(s, "[.]")
# [[1]]
# [1] "I"     "want"  "to"    "split"

But the more efficient method here is to use the fixed argument in strsplit(). Using this argument will bypass the regex engine and search for an exact match of ".".

strsplit(s, ".", fixed = TRUE)
# [[1]]
# [1] "I"     "want"  "to"    "split"

And of course, you can see help(strsplit) for more.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
5

You need to either place the dot . inside of a character class or precede it with two backslashes to escape it since the dot is a character of special meaning in regex meaning "match any single character (except newline)"

s <- 'I.want.to.split'
strsplit(s, '\\.')
# [[1]]
# [1] "I"     "want"  "to"    "split"
hwnd
  • 69,796
  • 4
  • 95
  • 132
1

Besides strsplit(), you can also use scan(). Try:

scan(what = "", text = s, sep = ".")
# Read 4 items
# [1] "I"     "want"  "to"    "split"
nghauran
  • 6,648
  • 2
  • 20
  • 29