1

I need help designing a Perl regular expression to match the string inside single quotes wherein escaped single quotes may be present.

For instance, the input text:

'SELECT * FROM TABLE WHERE COLUMN = \'text\''

Would match everything inside the outer single quotes, including the escaped quotes around the column text. I.e.:

SELECT * FROM TABLE WHERE COLUMN = \'text\'

I tried this:

/\s*'([^'|[^\\']]*)'\s*/

But that matching group failed to match anything at all. Any help would be appreciated.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Lee Fowler
  • 61
  • 12

4 Answers4

3

You can use the following regex:

/'((?:\\.|[^'\\])*)'/

Or an unrolled version that yields better performance:

/'([^'\\]*(?:\\.[^'\\]*)*)'/

See Demo 1 and Demo 2

REGEX EXPLANATION:

  • ' - Initial single quote
  • ((?:\\.|[^'\\])*) - Capturing group consisting of
    • (?:\\.|[^'\\])* - 0 or more characters other than ' or \ or escaped entities
  • ' - Final single quote

Demo:

my $str = "'SELECT * FROM TABLE WHERE COLUMN = \\'text\\'' "; 
 print "$str\n";
if ( $str =~ /'([^'\\]*(?:\\.[^'\\]*)*)'/ ) {
    print "$1\n";
}

Output of a demo program:

SELECT * FROM TABLE WHERE COLUMN = \'text\'
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Great solution. But may I ask, what does the ?: syntax denote? – Lee Fowler Jun 04 '15 at 21:16
  • `?:` marks a group as non-capturing one, i.e. we won't be able to use a back-reference to it. Very handy when we do not want to keep track of a captured group. Actually, the regex can be improved if we "unroll" it. I added it. – Wiktor Stribiżew Jun 04 '15 at 21:18
  • I see you escaped the escape char manually. Is there a way around that to be able to capture the escapes when a string is manually assigned to a variable within code? – stevieb Jun 04 '15 at 21:30
  • If you ask about `$str` value that I used, I did escape the "\" for it to be a literal. I thought your input string contain such single quotes that are preceded by literal "\"s. If you ask about "\'" in a Perl string literal, they are interpolated into literal `'`, so you won't see them in the output. You'd need to manually add a slash before them with another piece of code. – Wiktor Stribiżew Jun 04 '15 at 21:39
  • Technically, it is impossible to know where the quoted string ends and the other begins ([see here](https://regex101.com/r/mU8wC8/4)). So, if you do not have literal escape symbols, there can hardly be a regex solution. – Wiktor Stribiżew Jun 04 '15 at 21:53
  • @stribizhev I'm used negation `[^\\]` instead of `[^'\\]`. This is also give same result. can you explain your pattern. ? – mkHun Jun 05 '15 at 04:11
  • 1
    @Hussain: I updated my answer, and I want to stress that `'([^'\\]*(?:\\.[^'\\]*)*)'` is the same as `((?:\\.|[^'\\])*)`. I explained the latter. – Wiktor Stribiżew Jun 05 '15 at 06:41
  • WARNING there is an insane silent failure with this regex if the string is >32k/64k due to a Perl bug from 2002 that has not yet been fixed in 2020!!! https://stackoverflow.com/a/26229500/1046167 – Louis Semprini May 08 '20 at 21:26
  • @LouisSemprini You should mention that this bug is related to the non-unrolled regex version, and in **Java**, use `String regex= "(?s)'([^'\\\\]*(?:\\\\.[^'\\\\]*)*)'";` in Java. – Wiktor Stribiżew May 08 '20 at 21:34
0
#!/usr/bin/perl

$string = "'SELECT * FROM TABLE WHERE COLUMN = \'text\''";

$string =~ /^'(.*)'$/;
$string = $1 if $1;

print "$string\n";

Output:

SELECT * FROM TABLE WHERE COLUMN = 'text'

When the input is coming in from an external source (ie. not being manually inserted as a string var in the code itself), the above regex works:

open my $fh, '<', 'in.txt';

$string = <$fh>;

$string =~ /^'(.*)'$/;
$string = $1 if $1;

print "$string\n";

input file:

$ cat in.txt 
'SELECT * FROM TABLE WHERE COLUMN = \'text\''

Output:

SELECT * FROM TABLE WHERE COLUMN = \'text\'

stevieb
  • 9,065
  • 3
  • 26
  • 36
  • Ideally the matched text would include the escape character. I.e. `SELECT * FROM TABLE WHERE COLUMN = \'text\'` – Lee Fowler Jun 04 '15 at 21:05
  • Is this string being brought in from an external source, and not being manually assigned to a string var inside your code? If so, it works as you desire. I've updated my answer as an example. Let me know... if you are manually assigning inside your code, I'll do more testing. – stevieb Jun 04 '15 at 21:24
  • This will work only in case you have one quoted string with leading/trailing single quotes. In case you have many, say, delimited ones, this will fail. – Wiktor Stribiżew Jun 04 '15 at 21:28
0

I think the regex you're looking for is this:

/\s*'(([^']|[\\'])*)'\s*/
Chris Turner
  • 8,082
  • 1
  • 14
  • 18
0

If this is just for dumped data from a known source, you could just eval it.

my $str =q{ 'SELECT * FROM TABLE WHERE COLUMN = \'text\'' };
my $out = eval $str;
say $out;
cliveholloway
  • 368
  • 1
  • 2
  • 13