0

Good morning, I need to get some data from a website, I am trying some solutions but at the moment I didn't find the right one. This is the code:

    <tr><td class="h-text-left"><a href="/soccer/peru/liga-1/binacional-llacuabamba/YasJ57j7/" class="in-match"><span><strong>Binacional</strong></span> - <span>Llacuabamba</span></a></td><td class="h-text-center"><a href="/soccer/peru/liga-1/binacional-llacuabamba/YasJ57j7/">2:1</a></td><td class="table-main__odds colored" data-oid="3o4fmxv464x0x9r5fh"><span><span><span data-odd="2.16"></span></span></span></td><td class="table-main__odds" data-oid="3o4fmxv498x0x0" data-odd="3.31"></td><td class="table-main__odds" data-oid="3o4fmxv464x0x9r5fi" data-odd="3.13"></td><td class="h-text-right h-text-no-wrap">Yesterday</td></tr>
<tr><td class="h-text-left"><a href="/soccer/peru/liga-1/carlos-stein-atletico-grau/EwcmMDIc/" class="in-match"><span>Carlos Stein</span> - <span>Grau</span></a></td><td class="h-text-center"><a href="/soccer/peru/liga-1/carlos-stein-atletico-grau/EwcmMDIc/">1:1</a></td><td class="table-main__odds" data-oid="3o4cvxv464x0x9r5a3" data-odd="2.32"></td><td class="table-main__odds colored" data-oid="3o4cvxv498x0x0"><span><span><span data-odd="2.99"></span></span></span></td><td class="table-main__odds" data-oid="3o4cvxv464x0x9r5a4" data-odd="3.10"></td><td class="h-text-right h-text-no-wrap">Yesterday</td></tr>

You can see td class table-main__odds colored and td class table-main__odds; They are not always in the same position. I tried this approach:

...
    function print_odd($odd) {
    if (array_key_exists('data-odd', $odd->attr)) {
        return $odd->attr['data-odd'];
    }

    return $odd->children(0)->children(0)->children(0)->children(0)->attr['data-odd'];
}
...
        $odd1 = print_odd($odds[$b++]);
        $odd2 = print_odd($odds[$b++]);
        $odd3 = print_odd($odds[$b++]);
...

This code worked for some years but I think something's changed in the code Any advice?

Thanks

Edit: this is the page address: link website

Marci
  • 197
  • 1
  • 2
  • 19
  • If the code worked for years then something must have changed in the target website. We can't really help unless we know if this is the case and if it is what changed exactly – apokryfos Sep 19 '20 at 09:16
  • The html code that I posted is how it is now, I don't remember html code before. I am trying to change my code according the new html code, but I am not sure how I should do it – Marci Sep 19 '20 at 09:42
  • Is that code there on page load or is it generated by JavaScript afterwards? You can't easily scrape it if it's the latter case – apokryfos Sep 19 '20 at 10:29
  • no, it's a static html. I edited my post and I put the link. Thanks – Marci Sep 19 '20 at 10:41

1 Answers1

0

I assume the problem is that the inner HTML of the <td>s has changed or varies between elements. So sometimes you have a <td data-odd="... and other times you have <td><span...<span data-odd=".... In this case, maybe you can update your function with using some regex and preg_match to capture the data-odd="..." part, from the inner HTML. For example:

/* 
 I assume $odd parameter is a <td> DOMElement
 let's say $odd is a <td> with this structure:
<td class="table-main__odds colored" data-oid="3o4cvxv498x0x0">
  <span><span><span data-odd="2.99"></span></span></span>
</td>
*/

function print_odd($odd) {
    // if <td> has data-odd attribute -> this will do
    if (array_key_exists('data-odd', $odd->attr)) {
        return $odd->attr['data-odd'];
    }

    // else, grab inner HTML of td
    // see https://stackoverflow.com/questions/2087103/how-to-get-innerhtml-of-domnode/39193507
    // maybe like this
    $td_html = $odd->C14N();
    $regex = '/data-odd=\"([0-9.]+)\"+?/';

    preg_match($regex, $td_html, $matches);

    if ($matches) {
        return $matches[1]; // "2.99" (string)
    }

    // if nothing is found
    return false;

}
verjas
  • 1,793
  • 1
  • 15
  • 18
  • Thanks for your answer, you catch the point about what I need to extract, but I need some explanation because I don't understand how to apply your code. i.e. $td_html = $odd->C14N(); $regex = '/data-odd=\"([0-9.]+)\"+?/'; preg_match($regex, $td_html, $matches); if ($matches) { return $matches[1]; // "2.99" (string) } // if nothing is found return false; this code above I have to put into my function? sorry but I usually don't use regex. Thanks again – Marci Sep 19 '20 at 14:03