Parsing HTML using PHP
HTML is not a regular language and cannot be correctly parsed using regular expressions. Use an HTML parser to achieve this instead. In PHP, you have the DOMDocument
class available by default. See this question for an extensive list of libraries that can be used to parse and process HTML.
Here's how you can extract the <div>
class name using DOMDocument
class:
$html = <<<HTML
<div class="address adr">
<span class="street-address"><span class="no_ds"> CONTENT1</span>
<span class="postal-code">CONTENT2</span>
<span class="locality">CONTENT3</span>
/span>
</div>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('div') as $tag) {
echo $tag->getAttribute('class'), PHP_EOL;
}
Output:
address adr
Using an HTML parser, you can parse / manipulate HTML in any way you want and be sure that it works. This is not the case with regular expressions. Your regex might break when the order of attributes change. Regular expressions might fail when you have nested attributes that could be defined recursively, whereas an HTML parser will not.
Learning regular expressions
Regular expressions are so wide and isn't something that can be explained in one single answer. If you want to learn regular expressions, I suggest you start learning from a decent resource like Regular-Expressions.info.
For testing regular expressions, you can use an online tester such as Regex101.com, RegExr.com
etc. For incorporating them into your PHP script, you can use preg_*
functions -- preg_match()
, preg_match_all()
, preg_split()
and preg_grep()
.