1

Hey there I have a little problem. I tried to preg_match a whole xml file for specified words but it doesn't work

my xml:

 <product>
    <title>TestProduct</title>
    <Specifications>
      <item name="Specifications1">Test</item>
      <item name="Specifications2">Hello World</item>
    </Specifications>
    <body>
      <item name="Color">Black</item>
    </body>
 </product>

And i would like to cut and remove certain words out of the whole file using preg_match.

my php:

for ($i = 0; $i <= $length; $i++) {
   $var = $xml->product[$i];
   if (preg_match_all('/\b(\w*Test\w*)\b|\b(\w*Black\w*)\b/', $var, $result)){
      do something 
   }
     

But it doesn't work only when i replace

  $var->$xml->product[$i];

with

$var->$xml->product[$i]-> Specifications->item;

it matchs Test

How can i fix that i am out of ideas Thanks for help!

NeunNatter
  • 87
  • 1
  • 9
  • All I can suggest is look into XPATH to help locate your elements better. – Scuzzy Nov 23 '17 at 21:07
  • [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – ctwheels Nov 23 '17 at 21:16

2 Answers2

1

Don't mess around with regular expressions, try a parser instead:

<?php

$xml = <<<DATA
 <product>
    <title>TestProduct</title>
    <Specifications>
      <item name="Specifications1">Test</item>
      <item name="Specifications2">Hello World</item>
    </Specifications>
    <body>
      <item name="Color">Black</item>
    </body>
 </product>
DATA;

# set up the DOM
$dom = new DOMDocument();
$dom->loadXML($xml);

# set up the xpath
$xpath = new DOMXPath($dom);

foreach ($xpath->query("*[contains(., 'Test')]") as $item) {
    print_r($item);
}
?>

This yields all tags having Test as text somewhere.


The snippet sets up the DOM and uses xpath queries to look for the appropriate items which you can then loop over.
To have multiple strings you want to look up, use an alternation:
foreach ($xpath->query("*[contains(., 'Test') or contains(., 'Black')]") as $item) {
    print_r($item);
}
Jan
  • 42,290
  • 8
  • 54
  • 79
  • `$dom = new DOMDocument(); $dom->loadXML('Test.xml'); # set up the xpath $xpath = new DOMXPath($dom);` I tried it with that but it doesn't work? – NeunNatter Nov 23 '17 at 21:58
  • the error is: Warning: DOMDocument::loadXML(): Start tag expected, '<' not found in Entity, line: 1 – NeunNatter Nov 23 '17 at 22:02
  • @NeunNatter: `loadXML()` expects a string, you need `load()` here. – Jan Nov 24 '17 at 06:41
0

Elegant way is to use xpath, like also suggested by others.

To strictly respond to your regexp question: your problem seems to be caused by the fact that preg_match_all will search only the first line of subject by default. You can use the s modifier to extend this to all of your multiline string:

preg_match_all('/\b(\w*Test\w*)\b|\b(\w*Black\w*)\b/s', $var, $result)
Zoli Szabó
  • 4,366
  • 1
  • 13
  • 19