4

I have text like

la<-c("case when ANTIG_CLIENTE <= 4 then '01: ANTIG_CLIENTE <= 4' when ANTIG_CLIENTE <= 8 then '02: ANTIG_CLIENTE <= 8' 
else '99: Error' end ")

I want to extract the text between single quotes as a list:

"01: ANTIG_CLIENTE <= 4","02: ANTIG_CLIENTE <= 8","99: Error"

I tried two approaches with no success

> sub('[^\]+\"([^\']+).*', '\\1', la)
Error: '\]' is an unrecognized escape in character string starting "'[^\]"
> regmatches(x, gregexpr('"[^']*"', la))[[1]]
Error: unexpected ']' in "regmatches(x, gregexpr('"[^']"

How can I get the text between single quotes?

Oscar Benitez
  • 255
  • 3
  • 13

1 Answers1

3

This should get what you want. The only assumption is that all of the strings you want between single quotes contain a colon (otherwise, how should we distinguish '01: ANTIG_CLIENTE <= 4' from ' when ANTIG_CLIENTE <= 8 then ', both of which are between single quotes?):

> regmatches(la,gregexpr("'[^']*:[^']*'",la))
[[1]]
[1] "'01: ANTIG_CLIENTE <= 4'" "'02: ANTIG_CLIENTE <= 8'" "'99: Error'"   

Basically, we're trying to return all expressions (hence gregexpr instead of regexpr) of the form single quote, something besides single quote, colon, something besides single quote, single quote.

If you want to eliminate the single quotes in what is returned, you're going to need look-ahead and look-behind, which requires telling R to interpret your regex as perl:

> regmatches(la,gregexpr("(?<=')[^']*:[^']*(?=')",la,perl=T))
[[1]]
[1] "01: ANTIG_CLIENTE <= 4" "02: ANTIG_CLIENTE <= 8" "99: Error" 
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198