1

I have built a parser in Sprache and C# for files using a format I don't control. Using it I can correctly convert:

a = "my string";

into

my string

The parser (for the quoted text only) currently looks like this:

public static readonly Parser<string> QuotedText =
    from open in Parse.Char('"').Token()
    from content in Parse.CharExcept('"').Many().Text().Token()
    from close in Parse.Char('"').Token()
    select content;

However the format I'm working with escapes quotation marks using "double doubles" quotes, e.g.:

a = "a ""string"".";

When attempting to parse this nothing is returned. It should return:

a ""string"".

Additionally

a = "";

should be parsed into a string.Empty or similar.

I've tried regexes unsuccessfully based on answers like this doing things like "(?:[^;])*", or:

public static readonly Parser<string> QuotedText =
    from content in Parse.Regex("""(?:[^;])*""").Token()

This doesn't work (i.e. no matches are returned in the above cases). I think my beginners regex skills are getting in the way. Does anybody have any hints?

EDIT: I was testing it here - http://regex101.com/r/eJ9aH1

Community
  • 1
  • 1
will-hart
  • 3,742
  • 2
  • 38
  • 48

4 Answers4

2

If I'm understanding you correctly, this is the kind of regex you're looking for:

"(?:""|[^"])*"

See the demo. 1. " matches an opening quote 2. (?:""|[^"])* matches two quotes or any chars that are not a quote (including newlines), repeating 3. " matches the closing quote.

But it's always going to boil down to whether your input is balanced. If not, you'll be getting false positives. And if you have a string such as "string"", which should be matched?"string"",""`, or nothing?... That's a tough decision, one that, fortunately, you don't have to make if you are sure of your input.

zx81
  • 41,100
  • 9
  • 89
  • 105
  • Thanks - I can fairly safely assume the files I get are balanced inputs as they go through a lint process before I get to them. I'll try this out in C#. – will-hart Jun 13 '14 at 08:42
1

You can likely adapt your desired output from this pattern:

"(.+".+")"|(".+?")|("")

example:

http://regex101.com/r/lO1vZ4

l'L'l
  • 44,951
  • 10
  • 95
  • 146
0

If you only want to ignore consecutive double quotes, try this:

("{2,})

Live demo

CMPS
  • 7,733
  • 4
  • 28
  • 53
  • As can be seen in your demo, this doesn't still ignore `""`. – Amal Murali Jun 13 '14 at 02:20
  • @AmalMurali I update my regex. Amal regex just select for you the matching patter, it does not ignore or replace or do anything for you. In order to modify the selected part you need to use a method to replace what is in between parenthesis by an empty string – CMPS Jun 13 '14 at 02:22
  • As the title of the question says, the OP is trying to create a "*regex for ignoring consecutive quotation marks in string*". Your regex still matches consecutive double-quotes, which the OP *doesn't* want. – Amal Murali Jun 13 '14 at 02:25
  • I am matching the double quotes so they can be replaced by empty string, isn't it easier ? – CMPS Jun 13 '14 at 02:32
  • No, because in the process of matching consecutive double-quotes, you are also matching `"""`, `""""` etc. – Amal Murali Jun 13 '14 at 02:34
  • I can fix this, but he/she didn't mention that only 2 consecutive double-quotes must be ignored @AmalMurali – CMPS Jun 13 '14 at 02:36
0

This regex "("+) might help you to match extra unwanted double quotes.

here is the DEMO

Braj
  • 46,415
  • 5
  • 60
  • 76