I am trying to divide the string into three parts: name and time (date, time) and generic texts. It originally looks like:
data =
c("JENNIFER [Day 1, 9:00 A.M.]: Generic text, it doesn't matter what is going on here. There are more than 2 lines."
"SAM [Day 2, 10:15 A.M.]: This doesn't matter. It has a lot of lines."
"DAN'S [Day 4, 12:00 P.M.]: It doesn't really matter what's going on in this part.")
I was able to extract the first portion of the data, NAME [TIME]:, but what I am having hard time doing is to divide NAME and TIME.
match = regexpr("^[A-Z].*:", data)
regmatches(data, match)
This gives me:
JENNIFER [Day 1, 9:00 A.M.]:
SAM [Day 2, 10:15 A.M.]:
DAN'S [Day 4, 12:00 P.M.]:
I can see that names are all in capital letters, so I would say "^[A-Z]"
, but this would also pick up every other sentences beginning with a capital letter.
I am going to create a data frame:
Name Date Content
JENNIFER Day 1 9:00A.M "combined text"