1

Hi everyone I am having and issue using Regex and can not get it to work when there are spaces or line breaks in the content.

$content = "<dt><span>Name:</span></dt>
                      <dd>
                        John
                      </dd>
                      <dt><span>Age:</span></dt>
                      <dd>
                        40
                      </dd>
                      <dt><span>Sex:</span></dt>
                      <dd>
                        Male
                      </dd>";

The regex i am using is

preg_match_all('/<dt><span>(.*)<\/span><\/dt><dd>(.*)<\/dd>/',$content, $output);
Chill Web Designs
  • 1,311
  • 2
  • 16
  • 31

2 Answers2

2

Don't parse HTML with RegEx. Use DOM. Here's an example that will work if you are sure about HTML structure.

$dom = new DOMDocument();
@$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$spans = $xpath->query('//span');
$dds= $xpath->query('//dd');
for ($i = 0; $i < $spans->length; $i++)
{
    echo $spans->item($i)->nodeValue . $dds->item($i)->nodeValue . '<br>';
}

If you are not sure of it's structure, you'll need something a bit more complicated.

Ranty
  • 3,333
  • 3
  • 22
  • 24
0

Agree that you should use the DOM. however you are not taking account of the whitespace between and

Try:

preg_match_all('/<dt><span>(.*)<\/span><\/dt>.*<dd>(.*)<\/dd>/',$content, $output);
Captain Payalytic
  • 1,061
  • 8
  • 9