how to set create regex for this string

Question

<div id="plugin-description">
    <p itemprop="description" class="shortdesc">
        BuddyPress helps you build any type of community website using WordPress, with member profiles, activity streams, user groups, messaging, and more. </p>
    <div class="description-right">
                <p class="button">
            <a itemprop="downloadUrl" href="https://downloads.wordpress.org/plugin/buddypress.2.6.1.1.zip">Download Version 2.6.1.1</a>

i need description just with this code

<p itemprop="description" class="shortdesc">[a-z]</p>

i need download link

<a itemprop="downloadUrl" href="[A-Z]"></a>

Don't parse html with a regular expression. Use a parser. – ʰᵈˑ Jul 04 '16 at 15:53 — ʰᵈˑ, Jul 04 '16 at 15:53

score 0 · Answer 1 · edited May 23 '17 at 11:44

And once again:

<?php

$data = <<<DATA
<div id="plugin-description">
    <p itemprop="description" class="shortdesc">
        BuddyPress helps you build any type of community website using WordPress.
    </p>
    <div class="description-right">
        <p class="button">
            <a itemprop="downloadUrl" href=".zip">Download Version 2.6.1.1</a>
        </p>
    </div>
</div>
DATA;

$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);
$containers = $xpath->query("//div[@id='plugin-description']");

foreach ($containers as $container) {
    $description = trim($xpath->query(".//p[@itemprop='description']", $container)->item(0)->nodeValue);
    $link = $xpath->query(".//a[@itemprop='downloadUrl']/@href", $container)->item(0)->nodeValue;
    echo $description . $link;
}

?>

See a demo on ideone.com.

@Oms: See updated answer and demo link. – Jan Jul 04 '16 at 18:38 — Jan, Jul 04 '16 at 18:38

score 0 · Answer 2 · answered Jul 04 '16 at 17:10

There are better tools for parsing HTML than regular expressions. That said, there are times when parsing HTML with regular expressions works safely and consistently, so don't be bullied out of trying it. These cases are usually for small, known sets of HTML markup.

For this particular case, it seems that using an HTML parser would be effective leave you with more legible code. To illustrate this, I'll use a command line tool like pup, which will help you retrieve your content pretty simply. Let's pretend that the markup is stored at /tmp/input on your computer.

To grab the downloadUrl...

pup < /tmp/input 'a[itemprop="downloadUrl"] attr{href}'

To grab the description...

pup < /tmp/input 'p[itemprop="description"] text{}'

This I think illustrates the simplicity and benefits of using an HTML parser to grab what you're after.

how to set create regex for this string

2 Answers2