code :
<span class="_c1subtitle_c1">Name<br> 5132 - 0918</span>
i'm using regex to extract the text but i just can't figure out a pattern that will extract only 5132
i tried
pattern3 = '/<span class="_c1subtitle_c1">(*?)<\/span>/s';
If you want to match digits only you could try this pattern:
$s = ' <span class="_c1subtitle _c1">Name<br> 5132 - 0918</span>';
preg_match("/<br>\s*(\d+)\s*-/", $s, $matches);
$digits = $matches ? $matches[1] : NULL;
var_dump($digits);
The pattern "/<br>\s*(\d+)\s*-/"
will match any sequence of digits between the first <br>
and the next -
. Note that leading and trailing whitespace will not be included in the match because \s*
consumes it outside of the capture group (\d+)
.
To capture everything verbatim between the <br>
and the first -
you could use "/<br>(.+?)-/"
as the pattern:
preg_match("/<br>(.+?)-/", $s, $matches);
$text = $matches ? $matches[1] : NULL;
var_dump($text);
which will show that the whitespace has also been captured.
parsing data from html is not really the best candidate for regex. you should use html parsers for that.
But if you still want to do it with regex, you can use something like this /<br>(.*?)-/
what this does it searches for text <br>
, and captures anything up to the first "-" sign into $matches[1]
.
If you only want to capture numbers there is a better way to do that.
$text = ' <span class="_c1subtitle _c1">Name<br> 5132 - 0918</span>';
$result = preg_match('/<br>(.*?)-/', $text, $matches);
var_dump($result);//1 - if match was found, 0 otherwise
var_dump($matches);//array, 0 - full matched string, 1 - match inside group
var_dump(trim($matches[1]));//5132 this is what you want in your case
for more information i would recommend reading: dom parser and preg match documentation
It works correctly
$text = '<span class="_c1subtitle_c1">Name<br> 5132 - 0918</span>';
$match= preg_match("/\s\d+\s(?=\-)/",$text);
echo $match;
\s matches any whitespace character
\d+ matches a digit (equal to [0-9]) + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed
\- matches the character - literally
Instead of using a regex, you could use DOMDocument and DOMXPath to get the second text() node in the span.
The text node will give you 5132 - 0918
.
To get 5132
you could use explode and -
as the delimiter.
// Html from your curl request.
$html = <<<HTML
<span class="_c1subtitle_c1">Name<br> 5132 - 0918</span>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$secondTextNode = $xpath->query('//span[@class="_c1subtitle_c1"]/text()')->item(1);
if ($secondTextNode) {
$result = explode("-", $secondTextNode->nodeValue)[0];
echo $result;
}
That will give you:
5132
Simply preg_match('/<br>.*?(\d+)/', '<span class="_c1subtitle_c1">Name<br> 5132 - 0918</span>', $number)
This will get you started in the right direction.
$ch = curl_init('http://www.example.com/some.html');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
preg_match_all('/<br>.*?(\d+)/', $html, $numbers);
print_r($numbers); # You will see your matches
.*?(\d+)/', 'Name
5132 - 0918', $number)` – Michael Niño Apr 08 '18 at 16:21