combine two different regex patterns

Question

I'm using curl to extract data from a website and i want to get content of a specific <span>. it worked perfectly with

$pattern4 = '/<span class=\"_c1 ei_card_subtitle _c1\">(.*?)<\/span>/i';
$cc = preg_match_all($pattern4, $ccpage, $matches);

print_r($matches[1][0]);`

This return : some text - digits, in the original website they're separated by <br /> i already have a pattern to only match the digits

$pattern5 = "\s\d+\s(?=\-)"

but i don't know how to combine them to get only the digits from that specific

<span class="_c1 ei_card_subtitle _c1">

get digits from ``? can you show your html input that shows the content of the span — Scott Weaver, Apr 13 '18 at 17:44
I used [dom-crawler](https://symfony.com/doc/current/components/dom_crawler.html) component for crawling websites. — Saeed M., Apr 13 '18 at 17:46

score 1 · Answer 1 · edited Apr 13 '18 at 18:51

1

I think it would be better to use DOMDocument to scrap HTML, see Grabbing the href attribute of an A element as an example and here is a solution for your problem:

<?php
$html = '<html><head></head><body><span class="_c1 ei_card_subtitle _c1">some text - 128</span></body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$textContent = '';
foreach ($dom->getElementsByTagName('span') as $node) {
    if ($node->getAttribute('class') == '_c1 ei_card_subtitle _c1') {
        $textContent = $node->textContent;
        break;
    }
}
if ($textContent) {
    $pattern = '/\d+/';
    if (preg_match($pattern, $textContent, $matches)) {
        var_dump($matches[0]);
    }
}

edited Apr 13 '18 at 18:51

Casimir et Hippolyte

88,009
5
94
125

answered Apr 13 '18 at 18:15

Ermac

1,181
1
8
12

oblig: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Scott Weaver Apr 13 '18 at 18:18
too much code to extract a simple set of numbers i prefer regex – Timo Apr 13 '18 at 18:21
1

Hi, agreed with that some times too, if your HTML is short it may be too much, but you can consider it for longer HTML since and it may be a safer method whenever the HTML change with time, but in my opinion the regexp may be fine too :) – Ermac Apr 13 '18 at 18:24

Scott Weaver · Accepted Answer · 2018-04-13T18:07:03.380

-1

maybe something like:

<span class=\"_c1 ei_card_subtitle _c1\">.*?([\dX]+).*?<\/span>

regex101 demo

another (possibly safer) pattern that would use the <br/> tag to avoid matching too early (say if the text has a number in it):

<span class=\"_c1 ei_card_subtitle _c1\">.*?<br\s?\/>\s([\dX]+).*?<\/span>

demo

edited Apr 13 '18 at 18:07

answered Apr 13 '18 at 17:54

Scott Weaver

7,192
2
31
43

great one but i need to capture only 713286XXX971 – Timo Apr 13 '18 at 18:00
yes i want to capture the whole number with X – Timo Apr 13 '18 at 18:01

combine two different regex patterns

2 Answers2