How does preg_match work for HTML using simple_html_dom.php?

Question

I have the html code that looks like this

<div class="address adr">
    <span class="street-address"><span class="no_ds> CONTENT1</span>
        <span class="postal-code">CONTENT2</span>
        <span class="locality">CONTENT3</span>
    /span>
</div>

and

<div class="phone tel">
    <span class="no_ds">CONTENT4</span>
<div>

can I use preg_match to get the div class of both while at the same time getting the content inside of the two?

well, what I want to know is how preg_match works and what the backslashes mean and all the other things that it has.

There are either too many possible answers, or good answers would be too long for this format. Please add details to narrow the answer set or to isolate an issue that can be answered in a few paragraphs. [Regular-Expressions.info](http://www.regular-expressions.info/) is a great resource. Also review the PHP manual. — Madara's Ghost, Mar 08 '14 at 17:46

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

Parsing HTML using PHP

HTML is not a regular language and cannot be correctly parsed using regular expressions. Use an HTML parser to achieve this instead. In PHP, you have the DOMDocument class available by default. See this question for an extensive list of libraries that can be used to parse and process HTML.

Here's how you can extract the <div> class name using DOMDocument class:

$html = <<<HTML
<div class="address adr">
    <span class="street-address"><span class="no_ds"> CONTENT1</span>
        <span class="postal-code">CONTENT2</span>
        <span class="locality">CONTENT3</span>
    /span>
</div>
HTML;

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('div') as $tag) {
    echo $tag->getAttribute('class'), PHP_EOL;
}

Output:

address adr

Using an HTML parser, you can parse / manipulate HTML in any way you want and be sure that it works. This is not the case with regular expressions. Your regex might break when the order of attributes change. Regular expressions might fail when you have nested attributes that could be defined recursively, whereas an HTML parser will not.

Learning regular expressions

Regular expressions are so wide and isn't something that can be explained in one single answer. If you want to learn regular expressions, I suggest you start learning from a decent resource like Regular-Expressions.info.

For testing regular expressions, you can use an online tester such as Regex101.com, RegExr.com etc. For incorporating them into your PHP script, you can use preg_* functions -- preg_match(), preg_match_all(), preg_split() and preg_grep().

score 0 · Answer 2 · answered Mar 08 '14 at 18:09

0

Check out the manual of SIMPLE HTML DOM. I'm sure this will help you: Documentation Read everything carefully.

answered Mar 08 '14 at 18:09

A P

386
1
4
14

This should be a comment. So gain 50 reputation quickly :) – HamZa Mar 08 '14 at 18:15
Sorry, didn't noticed. – A P Mar 08 '14 at 18:52

How does preg_match work for HTML using simple_html_dom.php?

2 Answers2

Parsing HTML using PHP

Learning regular expressions