1

Is there any open source solution, or a generic regex for parsing name-value (key-value) pairs out of a random String, in Java, with the (optional) delimiters stripped out?

From a Regular expression for parsing name value pairs, one such regex could be

"((?:\"[^\"]*\"|[^=,])*)=((?:\"[^\"]*\"|[^=,])*)"

However, the above (and its variations on the aforementioned question), although working as expected, return the delimiters along with the value.

For instance, a pair like key="value" will produce {key, "value"} instead of {key, value}.

The latter form of output would be nicer, since it avoids string post-processing to remove the enclosing delimiters (quotes in this case).

Community
  • 1
  • 1
PNS
  • 19,295
  • 32
  • 96
  • 143
  • Is there a specific format that it needs to support? If so, please specify the format. If not, then you can simplify the above pattern to `"\"([^\"]*)\"=\"([^\"]*)\""` (i.e., requiring that the key and value always be wrapped in double-quotes, and grabbing only the part inside the quotes). – ruakh Jan 25 '12 at 16:36
  • The format is as in my example: key=value or key="value" (i.e., the quotes are optional but may be present). – PNS Jan 25 '12 at 16:45
  • 1
    Can the key be wrapped in quotes? What happens if a key or value needs to contain an actual double-quote character? Will they always be separated by commas? Can there be a trailing comma? – ruakh Jan 25 '12 at 16:48
  • If you say its random string, how would you like to parse strings like `a"b"=c=d="xy=z\""` ? – Prashant Bhate Jan 25 '12 at 22:25

1 Answers1

1

If you want to make the form adhere to optional quotes without them contained in either the key or value captures, you can do something like this (using your regex as an example, and including possible single quotes as delimeters as well).

Capture buffers 2,4 contain key,value pairs (without quotes).

"
 (['\"]?)  ([^'\"=,]+)  \1
 =
 (['\"]?)  ([^'\"=,]+)  \3
"

But this will collect possible garbage values separated by = sign.
I think its better to provide a class that includes limited acceptable valeus instead.

Something like this is what I would use.

"
 (['\"]?) \s* (\w[-:\s\w]*?) \s* \1
 \s* = \s*
 (['\"]?) \s* (\w[-:\s\w]*?) \s* \3
"

possible greedy version

\w+ (?: \s+[-:\w]+ )*
or
[-:\w]+ (?: \s+[-:\w]+ )*

in this

"
 (['\"]?) \s* (\w+(?:\s+[-:\w]+)*) \s* \1
 \s* = \s*
 (['\"]?) \s* (\w+(?:\s+[-:\w]+)*) \s* \3
"