I have a data set of strings and want to extract a substring up to and including the first colon. Earlier I posted here asking how to extract just the portion after the first colon: Split strings at the first colon Below I list a few of my attempts at solving the current problem.
I know that ^[^:]+:
matches the portion I want to keep, but I cannot figure out how to extract that portion.
Here is an example data set and the desired result.
my.data <- "here is: some text
here is some more.
even: more text
still more text
this text keeps: going."
my.data2 <- readLines(textConnection(my.data))
desired.result <- "here is:
0
even:
0
this text keeps:"
desired.result2 <- readLines(textConnection(desired.result))
# Here are some of my attempts
# discards line 2 and 4 but does not extract portion from lines 1,3, and 5.
ifelse( my.data2 == gsub("^[^:]+:", "", my.data2), '', my.data2)
# returns the portion I do not want rather than the portion I do want
sub("^[^:]+:", "\\1", my.data2, perl=TRUE)
# returns an entire line if it contains a colon
grep("^[^:]+:", my.data2, value=TRUE)
# identifies which rows contain a match
regexpr("^[^:]+:", my.data2)
# my attempt at anchoring the right end instead of the left end
regexpr("[^:]+:$", my.data2)
This earlier question concerns returning the opposite of a match. I have not figured out how to implement this solution in R if I start with the solution to my earlier question linked above: Regular Expression Opposite
I have recently obtained RegexBuddy to study regular expressions. That is how I know ^[^:]+:
matches what I want. I just have not been able to use that information to extract the matches.
I am aware of the stringr
package. Perhaps it can help, but I much prefer a solution in base R.
Thank you for any advice.