-1

I want to extract the href of the anchor having only certain class with it like link-wrapper.

So, this means I will have the href of the link like:

<a href="blaa..blaa" class="link-wrapper">click here</a>

P.S. It should extract both the links if they are aligned in sequential manner like:

<a href="blaa" class="link-wrapper">link-1</a><a href="blaa" class="link-wrapper">link-2</a>

I tried the solutions already present in the stack-overflow, but none suited my problem. Since some of them were in java-script and other languages. I tried looking for DOMDocument, but its bit difficult to exactly match the solution.

I tied some of the preg_match which didn't worked for me, like:

preg_match('/<a(?:(?!class\=")(?:.|\n))*class\="(?:(?!link\-wrapper)(?:.|\n))*link\-wrapper(?:(?!<\/a>)(?:.|\n))*<\/a>/i', $content, $output_array);
Shubham
  • 167
  • 2
  • 9
  • 1
    Don't parse HTML using regex. Use something like [DOMDocument](https://www.php.net/manual/en/class.domdocument.php) instead. – M. Eriksson Mar 26 '19 at 05:52

1 Answers1

1

You can use DOMDocument and DOMXPath to get your results. First load the HTML into a DOMDocument and then use an XPath query to find all the anchors that have class including link-wrapper e.g.

$html = '<a href="blaa..blaa" class="link-wrapper">click here</a><a href="not.blaa" class="something-else">link-3</a>
<a href="blaa" class="link-wrapper">link-1</a><a href="blaa..again" class="link-wrapper">link-2</a>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//a[contains(@class, "link-wrapper")]') as $a) {
    $urls[] = $a->attributes->getNamedItem('href')->nodeValue;
}
foreach ($urls as $url) {
    echo "$url\n";
}

Output:

blaa..blaa 
blaa 
blaa..again

Demo on 3v4l.org

Nick
  • 138,499
  • 22
  • 57
  • 95
  • thank you so much, this code seems to fit for the requirement. Also you represented the solution so good. Thank you again. – Shubham Mar 26 '19 at 12:11