-2

I'm using curl to extract data from a website and i want to get content of a specific <span>. it worked perfectly with

$pattern4 = '/<span class=\"_c1 ei_card_subtitle _c1\">(.*?)<\/span>/i';
$cc = preg_match_all($pattern4, $ccpage, $matches);

print_r($matches[1][0]);`

This return : some text - digits, in the original website they're separated by <br /> i already have a pattern to only match the digits

$pattern5 = "\s\d+\s(?=\-)"

but i don't know how to combine them to get only the digits from that specific

<span class="_c1 ei_card_subtitle _c1">

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
Timo
  • 35
  • 6

2 Answers2

1

I think it would be better to use DOMDocument to scrap HTML, see Grabbing the href attribute of an A element as an example and here is a solution for your problem:

<?php
$html = '<html><head></head><body><span class="_c1 ei_card_subtitle _c1">some text - 128</span></body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$textContent = '';
foreach ($dom->getElementsByTagName('span') as $node) {
    if ($node->getAttribute('class') == '_c1 ei_card_subtitle _c1') {
        $textContent = $node->textContent;
        break;
    }
}
if ($textContent) {
    $pattern = '/\d+/';
    if (preg_match($pattern, $textContent, $matches)) {
        var_dump($matches[0]);
    }
}
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
Ermac
  • 1,181
  • 1
  • 8
  • 12
  • oblig: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Scott Weaver Apr 13 '18 at 18:18
  • too much code to extract a simple set of numbers i prefer regex – Timo Apr 13 '18 at 18:21
  • 1
    Hi, agreed with that some times too, if your HTML is short it may be too much, but you can consider it for longer HTML since and it may be a safer method whenever the HTML change with time, but in my opinion the regexp may be fine too :) – Ermac Apr 13 '18 at 18:24
-1

maybe something like:

<span class=\"_c1 ei_card_subtitle _c1\">.*?([\dX]+).*?<\/span>

regex101 demo

another (possibly safer) pattern that would use the <br/> tag to avoid matching too early (say if the text has a number in it):

<span class=\"_c1 ei_card_subtitle _c1\">.*?<br\s?\/>\s([\dX]+).*?<\/span>

demo

Scott Weaver
  • 7,192
  • 2
  • 31
  • 43