4

I'm trying to complete a PHP app in the next 2 weeks and I just can't figure out the regular expression to parse some attribute strings.

I get random strings that are in the format of like this string:

KeyName1="KeyValue1" KeyName2='KeyValue2'

There may be any number of key value pairs in a single string and the values can be delimited by either single quotes ' or double quotes " in any combination within one string (but they are always delimited).

The key values can be of any lengths and contain any character except double quotes can't be inside double quotes and a single quotes can't be inside single quotes, but double quotes can be inside single quotes and single quotes can be inside double quotes.

The key value pairs can have any number of spaces between them and any number of spaces between the key name and the equal sign and the equal sign and the quote character that starts the key value.

I need to turn the string into an array that looks like:

$arrayName["KeyName1"] = "KeyValue1"
$arrayName["KeyName2"] = "KeyValue2"

etc.

I'm pretty sure it can be done with regular expressions but all my attempts have failed and I need some help (actually lots of help :-) to get this done and am hoping some of the amazing people here can provide that help or at least get me started.

David Husnian
  • 135
  • 2
  • 6

4 Answers4

7

Sure, no problem. Let's break it down:

\w+\s*=\s*

matches an alphanumeric keyword, followed by an equals sign (which might be surrounded by whitespace).

"[^"]*"

matches an opening double quote, followed by any number of characters except another double quote, then a (closing) double quote.

'[^']*'

does the same for single quoted strings.

Combining that using capturing groups ((...)) with a simple alternation (|) gives you

(\w+)\s*=\s*("[^"]*"|'[^']*')

In PHP:

preg_match_all('/(\w+)\s*=\s*("[^"]*"|\'[^\']*\')/', $subject, $result, PREG_SET_ORDER);

fills $result with an array of matches. $result[n] will contain the details of the nth match, where

  • $result[n][0] is the entire match
  • $result[n][1] contains the keyword
  • $result[n][2] contains the value (including quotes)

Edit:

To match the value part without its quotes, regardless of the kind of quotes that are used, you need a slightly more complicated regex that uses a positive lookahead assertion:

(\w+)\s*=\s*(["'])((?:(?!\2).)*)\2

In PHP:

preg_match_all('/(\w+)\s*=\s*(["\'])((?:(?!\2).)*)\2/', $subject, $result, PREG_SET_ORDER);

with the results

  • $result[n][0]: entire match
  • $result[n][1]: keyword
  • $result[n][2]: quote character
  • $result[n][3]: value

Explanation:

(["'])    # Match a quote (--> group 2)
(         # Match and capture --> group 3...
 (?:      # the following regex:
  (?!\2)  # As long as the next character isn't the one in group 2,
  .       # match it (any character)
 )*       # any number of times.
)         # End of capturing group 3
\2        # Then match the corresponding quote character.
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Thanks so much for the help and for the excellent way you presented it, it really help make it clear - and made me feel a little foolish for not being able to figure it out myself :-) I've designed and architected multi-country applications for huge corporations, I've designed enterprise data warehouses with accompanying business intelligence applications but regular expressions are always very hard for me (they're sort of a slightly easier APL!) – David Husnian Jun 10 '13 at 21:04
  • @DavidHusnian: Sure, you're welcome. I've added a regex that will capture the value for you without the quotes as well. – Tim Pietzcker Jun 11 '13 at 05:46
2

A little variant from Tim Pietzcker way:

preg_match_all('/(\w+)\s*=\s*(?|"([^"]*)"|\'([^\']*)\')/', $subject, $result, PREG_SET_ORDER);

Then you have $result[n][2] that contains the value without quotes.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
1

How to

You can use back references for what you need, see this pattern:

/\b(\w+)\s*=\s*('|\")(.*)\2/U

\b is word boundary, (\w) captures the key, followed by=, (captured) apostrophe or quote, followed by anything until \2, which is the second captured brackets (apostrophe or quote). The U modifier means ungreedy evaluation.

Example

  // match the key-value pairs
  $text = "mykey1= \"my'value1'\"  mykey2 = 'my\"value2' mykey3=\"my value3\"";
  preg_match_all("/\b(\w+)\s*=\s*('|\")(.*)\\2/U",$text,$matches);

  // produce result in format you need
  $result = array();
  for($i=0; $i<count($matches[0]); ++$i) {
    $result[$matches[1][$i]] = $matches[3][$i];
  }

Result

Array
(
    [mykey1] => my'value1'
    [mykey2] => my"value2
    [mykey3] => my value3
)
Jan Turoň
  • 31,451
  • 23
  • 125
  • 169
0

Output wanted:

$arrayName["KeyName1"] = "KeyName1"
$arrayName["KeyName2"] = "KeyName2"

I hope that you meant:

$arrayName["KeyName1"] = "KeyValue1"
$arrayName["KeyName2"] = "KeyValue2"

function paramStringToArray($string) {
    $array = array_filter(explode(' ', $string));

    $result = array();
    foreach($array as $value) {
    $data  = explode('=', $value);
        $data[1] = trim($data[1],'"');
        $data[1] = trim($data[1],'\'');
        $result[$data[0]] = $data[1];
    }
    return $result;
}

$string = 'KeyName1="KeyValue1" KeyName2=\'KeyValue2\'';

echo '<pre>';
var_dump(paramStringToArray($string));
echo '</pre>';

Output:

array(2) {
  ["KeyName1"]=> "KeyValue1"
  ["KeyName2"]=> "KeyValue2"
}
JimL
  • 2,501
  • 1
  • 19
  • 19
  • Thanks Jim. You're right I meant key value and have updated the post. I don't think explode will work because an equal sign is a possibility inside the key value. – David Husnian Jun 09 '13 at 20:22