I have a need to split on words and end marks (punctuation of certain types). Oddly pipe ("|") can count as an end mark. I have code that words on end marks until I try to add the pipe. Adding the pipe makes the strsplit
every character. Escaping it causes and error. How can I include the pipe int he regular expression?
x <- "I like the dog|."
strsplit(x, "[[:space:]]|(?=[.!?*-])", perl=TRUE)
#[[1]]
#[1] "I" "like" "the" "dog|" "."
strsplit(x, "[[:space:]]|(?=[.!?*-\|])", perl=TRUE)
#Error: '\|' is an unrecognized escape in character string starting "[[:space:]]|(?=[.!?*-\|"
The outcome I'd like:
#[[1]]
#[1] "I" "like" "the" "dog" "|" "." #pipe is an element