1

The obvious extension of question R split on delimiter (split) keep the delimiter (split) is: How to split a string keeping the delimiters at the beginning of each part?

x <- "What is this?  It's an onion.  What! That's| Well Crazy."

solution

unlist(strsplit(x, "(?<=[?.!|])", perl=TRUE))

gives:

"What is this?"    "  It's an onion." "  What!" " That's|" " Well Crazy."

Whereas I'm looking for:

"What is this"    "? It's an onion" ".  What" "! That's" "| Well Crazy."

changing the positive lookbehind into positive lookahead doesn't solve the problem.

MarkusN
  • 3,051
  • 1
  • 18
  • 26

1 Answers1

1

I managed to solve it using a positive lookahead followed by a word boundary marker:

x <- "What is this?  It's an onion.  What! That's| Well Crazy."
strsplit(x, "(?=[?.!|].)\\b", perl=TRUE)

[1] "What is this"     "?  It's an onion" ".  What"          "! That's"        
[5] "| Well Crazy."

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360