R regex get the text between single quotes

Question

I have text like

la<-c("case when ANTIG_CLIENTE <= 4 then '01: ANTIG_CLIENTE <= 4' when ANTIG_CLIENTE <= 8 then '02: ANTIG_CLIENTE <= 8' 
else '99: Error' end ")

I want to extract the text between single quotes as a list:

"01: ANTIG_CLIENTE <= 4","02: ANTIG_CLIENTE <= 8","99: Error"

I tried two approaches with no success

> sub('[^\]+\"([^\']+).*', '\\1', la)
Error: '\]' is an unrecognized escape in character string starting "'[^\]"
> regmatches(x, gregexpr('"[^']*"', la))[[1]]
Error: unexpected ']' in "regmatches(x, gregexpr('"[^']"

How can I get the text between single quotes?

MichaelChirico · Accepted Answer · 2015-08-02T23:58:31.173

3

This should get what you want. The only assumption is that all of the strings you want between single quotes contain a colon (otherwise, how should we distinguish '01: ANTIG_CLIENTE <= 4' from ' when ANTIG_CLIENTE <= 8 then ', both of which are between single quotes?):

> regmatches(la,gregexpr("'[^']*:[^']*'",la))
[[1]]
[1] "'01: ANTIG_CLIENTE <= 4'" "'02: ANTIG_CLIENTE <= 8'" "'99: Error'"

Basically, we're trying to return all expressions (hence gregexpr instead of regexpr) of the form single quote, something besides single quote, colon, something besides single quote, single quote.

If you want to eliminate the single quotes in what is returned, you're going to need look-ahead and look-behind, which requires telling R to interpret your regex as perl:

> regmatches(la,gregexpr("(?<=')[^']*:[^']*(?=')",la,perl=T))
[[1]]
[1] "01: ANTIG_CLIENTE <= 4" "02: ANTIG_CLIENTE <= 8" "99: Error"

edited Aug 02 '15 at 23:58

answered Aug 02 '15 at 23:41

MichaelChirico

33,841
14
113
198

Thanks for your answer. The string never starts with single quote. I get an error with: > `x<-unlist(strsplit(la,split="'")) Error in strsplit(la, split = "'") : non-character argument` – Oscar Benitez Aug 02 '15 at 23:47
@OscarBenitez see update. The revision is a better answer. – MichaelChirico Aug 02 '15 at 23:53
@OscarBenitez inre: your error, the only thing I can think of is that possibly `class(la)!="character"` – MichaelChirico Aug 02 '15 at 23:54
You could remove the quotes themselves by using perl look-aheads and look-behinds - `regmatches(la,gregexpr("(?<=')[0-9].+?(?=')",la,perl=TRUE))` – thelatemail Aug 02 '15 at 23:56
@thelatemail yep, was just about to edit that in. thanks :) – MichaelChirico Aug 02 '15 at 23:57

R regex get the text between single quotes

1 Answers1

Linked