With a regex: How can I match comments which begin with a semicolon unless the semicolon is surrounded on both sides by unescaped quotes, as shown below (the green blocks denote the matched comments )?:
Note, that the dquotes can by escaped by doubling them up ""
.
Such escaped dquotes behave as completely different characters, i.e. they do not have the ability to surround the semicolon and disable its comment-starting function.
Also, unbalanced dquotes are treated as escaped dquotes.
With Bubble's help, I have gotten as far as the regex below, which fails to correctly treat a trailing escaped dquote in the last test vector line.
^(?>(?:""[^""\n]*""|[^;""\n]+)*)""?[^"";\n]*(;.*)
See it run here.
Test vectors (the same as in the color-coded diagram above):
Peekaboo ; A comment starts with a semicolon and continues till the EOL
Unless the semicolon is surrounded by dquotes ”Don’t do it ; here” ;but match me; once
Im not surrounded ”so pay attention to me” ; ”peekaboo”
Im not surrounded ”so pay attention” to;me” ; ”peekaboo”
Im not surrounded ”so pay attention to me ; peekaboo
Dquote escapes a dquote so ”dont pay attention to ””me;here”” buster” do it ; here
Don’t pay attention to ”””me;here””” but do ””it;here””
and ”dont do ””it;here””” either ;peekaboo
but "pay attention to "it;here"" ;not here though
Simon said ”I like goats” then he added ”and sheep;” ;a good comment is ”here
Simon said ”I like goats” then he added ”and sheep;” dont do it here
Simon said ””I like goats;”peekaboo
Simon said ”I like goats;””peekaboo