19

Can someone provide a regular expression for parsing name/value pairs from a string? The pairs are separated by commas, and the value can optionally be enclosed in quotes. For example:

AssemblyName=foo.dll,ClassName="SomeClass",Parameters="Some,Parameters"
Chris Karcher
  • 2,252
  • 7
  • 24
  • 31

3 Answers3

34
  • No escape:

    /([^=,]*)=("[^"]*"|[^,"]*)/
    
  • Double quote escape for both key and value:

    /((?:"[^"]*"|[^=,])*)=((?:"[^"]*"|[^=,])*)/
    
    key=value,"key with "" in it"="value with "" in it",key=value" "with" "spaces
    
  • Backslash string escape:

    /([^=,]*)=("(?:\\.|[^"\\]+)*"|[^,"]*)/
    
    key=value,key="value",key="val\"ue"
    
  • Full backslash escape:

    /((?:\\.|[^=,]+)*)=("(?:\\.|[^"\\]+)*"|(?:\\.|[^,"\\]+)*)/
    
    key=value,key="value",key="val\"ue",ke\,y=val\,ue
    

Edit: Added escaping alternatives.

Edit2: Added another escaping alternative.

You would have to clean up the keys/values by removing any escape-characters and surrounding quotes.

Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138
  • This works for my simple scenario! Though, it might be nice for it to support including a quote in the value by escaping it, either double ("") or with a backslash (\") – Chris Karcher Oct 03 '08 at 18:39
  • can you please help me? I need something similar but more like json http://stackoverflow.com/questions/6099891/json-text-split-reg-expression-or-parser – Val May 23 '11 at 16:35
  • what is regex for key=value&key=value where key or value can be null, key and value can be any thing – virsha Apr 08 '17 at 04:19
  • @virsha Please [add a new question](/questions/ask) if you ask for something different. – Markus Jarderot Apr 09 '17 at 11:20
2

Nice answer from MizardX. Minor niggles - it doesn't allow for spaces around names etc (which may not matter), and it collects the quotes as well as the quoted value (which also may not matter), and it doesn't have an escape mechanism for embedding double quote characters in the quoted value (which, once more, may not matter).

As written, the pattern works with most of the extended regular expression systems. Fixing the niggles would probably require descent into, say, Perl. This version uses doubled quotes to escape -- hence a="a""b" generates a field value 'a""b' (which ain't perfect, but could be fixed afterwards easily enough):

/\s*([^=,\s]+)\s*=\s*(?:"((?:[^"]|"")*)"|([^,"]*))\s*,?/

Further, you'd have to use $2 or $3 to collect the value, whereas with MizardX's answer, you simply use $2. So, it isn't as easy or nice, but it covers a few edge cases. If the simpler answer is adequate, use it.

Test script:

#!/bin/perl -w

use strict;
my $qr = qr/\s*([^=,\s]+)\s*=\s*(?:"((?:[^"]|"")*)"|([^,"]*))\s*,?/;

while (<>)
{
    while (m/$qr/)
    {
        print "1= $1, 2 = $2, 3 = $3\n";
        $_ =~ s/$qr//;
    }
}

This witters about either $2 or $3 being undefined - accurately.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
0

This is how I would do it if you can use Perl 5.10.

qr/
  (?<key>
    (?:
      [^=,\\]
    |
      (?&escape)
    )++ # Prevent null keys
  )

  \s*+
  =
  \s*+

  (?<value>
    (?&quoted)
  |
    (?:
      [^=,\s\\]
    |
      (?&escape)
    )++ # Prevent null value ( use quotes for that )
  )

  (?(DEFINE)
    (?<escape>\\.)
    (?<quoted>
      "
        (?:
          (?&escaped)
        |
          [^"\\]
        )*+
      "
    )
  )
/x

The elements would be accessed through %+.

perlretut was very helpful in creating this answer.

Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129