1

I want to parse the string to a dataset through regex matching in PHP. Here is my code:

$string = "?\t\t\t\t\t\t?\t\t\t\t\t\t\t\t\t\t\t\t<?xml version=\"1.0\" encoding=\"UTF-8\"?><documents><Resp><gatewayId>g10060<\/gatewayId><accountId>310198232<\/accountId><orderNo>0970980541000510490500480<\/orderNo><tId><\/tId><tAmt>20<\/tAmt><result>1<\/result><respCode>21<\/respCode><signMD5>7ecd1eb9b870aaba3bfa45892095194e<\/signMD5><\/Resp><\/documents>";
preg_match_all('/<(.*?)>(.*?)<\\/(.*?)>/', $string, $arr);
echo json_encode($arr);

However it only returns me [[],[],[],[]], as empty arrays. I've tried the regex expression on https://regex101.com/, and it shows me the correct result, but it is not working on my server.

What I want is:

[ "gatewayId" => "g10060",
  "accountId" => "310198232",
  "orderNo" => "0970980541000510490500480",
  "tId" => "",
  "tAmt" => "20",
  "result" => "1",
  "respCode" => "21",
  "signMD5" => "7ecd1eb9b870aaba3bfa45892095194e" ]

How can I fix this?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jacky Lau
  • 665
  • 5
  • 21

3 Answers3

4

Use:

<?php

$string = "?\t\t\t\t\t\t?\t\t\t\t\t\t\t\t\t\t\t\t<?xml version=\"1.0\" encoding=\"UTF-8\"?><documents><Resp><gatewayId>g10060<\/gatewayId><accountId>310198232<\/accountId><orderNo>0970980541000510490500480<\/orderNo><tId><\/tId><tAmt>20<\/tAmt><result>1<\/result><respCode>21<\/respCode><signMD5>7ecd1eb9b870aaba3bfa45892095194e<\/signMD5><\/Resp><\/documents>";
preg_match_all('#<([^\?>]+)>([^<]+)<\\\/[^>]+>#', $string, $arr);

list($_, $tags, $values)= $arr;

// As @billynoah said it's much less code
$result = array_combine($tags, $values);

/*
 * Old inefficient code commented
 *
$result = array_reduce(array_keys($tags), function($carry, $key) use ($tags, $values){
    $k = $tags[$key];
    $v = $values[$key];
    $carry[$k] = $v;
    return $carry;
},[]);
*/

var_dump($result);

Result:

array(7) {
  ["gatewayId"] => string(6) "g10060"
  ["accountId"] => string(9) "310198232"
  ["orderNo"]   => string(25) "0970980541000510490500480"
  ["tAmt"]      => string(2) "20"
  ["result"]    => string(1) "1"
  ["respCode"]  => string(2) "21"
  ["signMD5"]   => string(32) "7ecd1eb9b870aaba3bfa45892095194e"
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Wizard
  • 862
  • 6
  • 9
3

You need to double escape the backslash. It also helps to use a non-slash delimiter for readability:

preg_match_all('~<(.*?)>(.*?)<\\\/(.*?)>~', $string, $arr);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • for the record if you want to use `/` as a delimiter here you actually need to 5 backslashes for the whole sequence: `preg_match_all('/<(.*?)>(.*?)<\\\\\/(.*?)>/', $string, $arr);`. It's a bit absurd - hence why I'd just pick a different delimiter. glad it helped. – But those new buttons though.. Feb 20 '17 at 02:55
2

First of all, regex is not the best solution for parsing XML strings. I think with SimpleXml would be much easier.

$object = new SimpleXMLElement($xmlString);

I've read your comments. If I were you I would try to clean the XML and use it as an XML.. you're going to end up running in circles anyways by changing the regex rules if something changes in the response. Trim, replace make it a valid XML or .... maybe you can try getting the valid XML directly from the source

Catalin Minovici
  • 181
  • 1
  • 13