11

How can I efficiently split the following string on the first comma using base?

x <- "I want to split here, though I don't want to split elsewhere, even here."
strsplit(x, ???)

Desired outcome (2 strings):

[[1]]
[1] "I want to split here"   "though I don't want to split elsewhere, even here."

Thank you in advance.

EDIT: Didn't think to mention this. This needs to be able to generalize to a column, vector of strings like this, as in:

y <- c("Here's comma 1, and 2, see?", "Here's 2nd sting, like it, not a lot.")

The outcome can be two columns or one long vector (that I can take every other element of) or a list of stings with each index ([[n]]) having two strings.

Apologies for the lack of clarity.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • extremely hacky, but what about something like `list(head(y[[1]],1), paste(tail(y[[1]],-1), collapse = ","))` where `y` is the output of `strsplit(x, ...)`? – Chase Apr 25 '12 at 04:08
  • Chase I tried it but couldn't seem to get it to work for a vector of similar strings. I edited my original post to further explain the problem. – Tyler Rinker Apr 25 '12 at 04:17
  • the `str_locate_all(string=y, ',')` will find all index locations of your pattern (comma in your case) which can then be applied to select out of vector or column. – John Apr 25 '12 at 04:23

5 Answers5

13

Here's what I'd probably do. It may seem hacky, but since sub() and strsplit() are both vectorized, it will also work smoothly when handed multiple strings.

XX <- "SoMeThInGrIdIcUlOuS"
strsplit(sub(",\\s*", XX, x), XX)
# [[1]]
# [1] "I want to split here"                               
# [2] "though I don't want to split elsewhere, even here."
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
10

From the stringr package:

str_split_fixed(x, pattern = ', ', n = 2)
#      [,1]                  
# [1,] "I want to split here"
#      [,2]                                                
# [1,] "though I don't want to split elsewhere, even here."

(That's a matrix with one row and two columns.)

flodel
  • 87,577
  • 21
  • 185
  • 223
4

Here is yet another solution, with a regular expression to capture what is before and after the first comma.

x <- "I want to split here, though I don't want to split elsewhere, even here."
library(stringr)
str_match(x, "^(.*?),\\s*(.*)")[,-1] 
# [1] "I want to split here"                              
# [2] "though I don't want to split elsewhere, even here."
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
3

library(stringr)

str_sub(x,end = min(str_locate(string=x, ',')-1))

This will get the first bit you want. Change the start= and end= in str_sub to get what ever else you want.

Such as:

str_sub(x,start = min(str_locate(string=x, ',')+1 ))

and wrap in str_trim to get rid of the leading space:

str_trim(str_sub(x,start = min(str_locate(string=x, ',')+1 )))

John
  • 41,131
  • 31
  • 82
  • 106
2

This works but I like Josh Obrien's better:

y <- strsplit(x, ",")
sapply(y, function(x) data.frame(x= x[1], 
    z=paste(x[-1], collapse=",")), simplify=F))

Inspired by chase's response.

A number of people gave non base approaches so I figure I'd add the one I usually use (though in this case I needed a base response):

y <- c("Here's comma 1, and 2, see?", "Here's 2nd sting, like it, not a lot.")
library(reshape2)
colsplit(y, ",", c("x","z"))
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • In your first part I don't see why you would use sapply over the seq_along(y) instead of just y. You don't look like you ever actually need the index explicitly. It also looks like you're removing all the commas even though you wanted them to be kept in the other strings? – Dason Oct 06 '12 at 20:35