-3
<a data-track='' _sp= class=s-item__link href=get_this_href>...</a>

With the above link, the data-track contains some json data. The _sp= could contain numbers/letters and a period (.). The class is s-item__link.

I would need the get_this_href and then I can go from there.

This is the regex I tried... but im stuck from here.

<a\b(?=[^>]* class="[^"]*(?<=[" ])s-item__link[" ])(?=[^>]* href="([^"]*))

Here is an example: https://regex101.com/r/rVPeUI/1

$link = ""; //url im scraping
$html = file_get_html($link);
//find is part of simple_html_dom.php. im saying each li item is an $item.

foreach ($html->find('li.s-item    ') as $item) {
    //$item contains the decent amount of nested divs with spans and links.
}
letsCode
  • 2,774
  • 1
  • 13
  • 37

1 Answers1

2

Without using Regex, its better to use DOMDocument() to parse HTML tags:

$doc = DOMDocument::loadHTML($html);
$xpath = new DOMXPath($doc);
$query = "//a[@class='s-item__link']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
  echo "HREF " . $entry->getAttribute("href");
}
Wasif
  • 14,755
  • 3
  • 14
  • 34
  • Thank you for your response. This doesnt play with with the current code I have. I am looping through a UL tag. I will update my question with this code. – letsCode Oct 16 '20 at 02:55