I am using trying to remove the text up until the first comma in a string that has one or more commas. For some reason I am finding that this always removes everything up until the last comma for all strings.
The string looks like:
OCR - (some text), Variant - (some text), Bad Subtype - (some text)
and my regex is returning:
Bad Subtype - (some text)
when the desired output is:
Variant - (some text), Bad Subtype - (some text)
Variant is not guaranteed to be in the second position.
#select all strings beginning with OCR in the column Tags
clean<- subset(all, grepl("^OCR", all$Tags)
#trim the OCR text up to the first comma, and store in a new column called Tag
clean$Tag<- gsub(".*,", "", clean$Tag)
or
clean$Tag <- gsub(".*\\,", "", clean$Tag)
or
clean$Tag<- sub(".*,", "", clean$Tag)
etc..