0

I have struck in the string preg match in php. From the below string i need to match 'index.php?c_id=' and need to get the value of that string. (Ex:index.php?c_id=161377)

$str = '<h3>Resources</h3>
<p><a href="index.php?ci_id=161377">Announcing Upgraded Firmware for N3680 Decoded 2D Imager</a></p>
<p><a href="https://www.honeywellaidc.com/products/oem-scan-engines/2d-imagers/n3680-series">N3680 Product webpage</a></p>
<p><a href="index.php?ci_id=161376">N3680 Product datasheet</a></p>';
preg_match_all('#index.php?([^\s]+)"#', $str, $matches,PREG_OFFSET_CAPTURE);
print_r($matches[1]);

I need the output: 161377 161376

Thanks & regards Kaif

Kaif Khan
  • 219
  • 6
  • 15
  • Parsing HTML with Regular Expressions..? Well, it's got to be done I suppose: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – CD001 May 03 '17 at 13:10
  • Or after calling the `preg_match_all(...)` -> `foreach($matches[1] as $currentMatch){ $mFinalArray[] = explode('=',$currentMatch[0])[1]; }` . If you `print_r($mFinalArray)` you have the numbers you want. – Antonios Tsimourtos May 03 '17 at 13:10
  • 1
    If you're *just* trying to get the id numbers though... `/index\.php\?ci_id=([0-9]+)/` – CD001 May 03 '17 at 13:15
  • try `/(?<=ci_id=)(.*)(?=")/` – Chetan Ameta May 03 '17 at 13:44

3 Answers3

0

In primis using regexes to parse HTML is generally a bad idea. It works here just because you are not trying anything more complex than finding a word, but avoid this tactic in the future, or you will end up trying to do something which cannot be done.

Beside the warning, you are simply looking in the wrong place. preg_match's documentation says

If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

So to find all the matches, you'll simply have to look in $matches[0] instead of $matches[1] (or to look in all the positions of $matches from 1 on)

frollo
  • 1,296
  • 1
  • 13
  • 29
0

Thanks you guys, for your support. Based on your comments i found the answer.

$str = '<h3>Resources</h3>
<p><a href="index.php?ci_id=161377">Announcing Upgraded Firmware for N3680 Decoded 2D Imager</a></p>
<p><a href="https://www.honeywellaidc.com/products/oem-scan-engines/2d-imagers/n3680-series">N3680 Product webpage</a></p>
<p><a href="index.php?ci_id=161376">N3680 Product datasheet</a></p>';
preg_match_all('/index\.php\?ci_id=([0-9]+)/', $str, $matches,PREG_OFFSET_CAPTURE);
$i=0;
foreach($matches[1] as $key => $val)
{
    echo '<br>'.$val[$i];
}
Kaif Khan
  • 219
  • 6
  • 15
0

Don't use regex to parse html. Instead, DomDocument and Xpath can do that work

$dom = new DomDocument();
$dom->loadHTML($str);

$xpath = new DomXpath($dom);
$hrefs = $xpath->evaluate('//a[starts-with(@href, "index.php?ci_id")]/@href');
foreach($hrefs as $href) {
  list(, $ci_id) =  explode('=', $href->nodeValue);
  echo $ci_id ."<br>\n";
}

demo

splash58
  • 26,043
  • 3
  • 22
  • 34
  • Thanks for the reply, I have checked, its also giving the the same result which i did with regex. Here my question is why we should not use regex, as per my knowledge Domdocument execution take more time to execute. Can you please explain(i.e, some disadvantages of regex) – Kaif Khan May 03 '17 at 15:51
  • look there, for example - http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not – splash58 May 03 '17 at 17:03