Regex to change only number inside a div

Question

I have this regex for now

/<div[^>]*>|(\d{1,9}\.\d{2})/

which detect the opening of a <div> and group the number in a currency format inside it.

If I have a string like this

<div class="foo">Hello world 4546.00 and 6596.45 bla bla bla</div>

I would like to only replace the numbers inside it with a preg_replace

As you can see in the example below, I don't want the number outside of the div to be selected, only ones inside.

https://regex101.com/r/AZd896/1/

@ctwheels we I need to summon him because it will be helpful to finish my client's task from hell ;) — Warface, Dec 08 '17 at 18:57
What you're looking to do is outside the scope of the capabilities for regular expressions. Regexes can find patterns in code, but they can't get semantic meaning. See http://htmlparsing.com/regexes for examples of why you don't want to try to parse HTML with regexes. — Andy Lester, Dec 08 '17 at 19:47
@AndyLester Lord Satan who goes by the name of `ctwheels` gave me a good regex which helped me do my `preg_replace`. And it's working like a charm https://regex101.com/r/AZd896/2 — Warface, Dec 08 '17 at 19:53

ctwheels · Accepted Answer · 2017-12-08T19:53:42.360

Brief

I'm not sure why you got downvoted so quickly, but I can only assume it's because of this question's topic and its relationship with regex match open tags except xhtml self contained tags.

By no means is this the best answer, but, in the scope of your question, it does solve your issue.

Code

See regex in use here

(?:<div[^>]*>|\G(?!\A))(?:(?!</div>).)*?\K\d{1,9}\.\d{2}

If the <div> tag might span multiple lines, you can add the s modifier to allow . to match newline characters as seen here.

Results

Input

dasdfasdf 355.56 asdfasd
<div class="sdaf">sdfsad 36546545643.00 asdfa sdf sadfasdf 544.45 sadfs</div>
dasdfasdf 355.56 asdfasd

Output

dasdfasdf 355.56 asdfasd
<div class="sdaf">sdfsad 36 asdfa sdf sadfasdf  sadfs</div>
dasdfasdf 355.56 asdfasd

Explanation

(?:<div[^>]*>|\G(?!\A)) Match either of the following
- <div[^>]*> Match the following
  - <div Match this literally
  - [^>]* Match any number of any character not present in the set (anything except >)
  - > Match this literally
- \G(?!\A) Assert position at the end of the previous match
(?:(?!</div>).)*? Tempered greedy token matching any character any number of times, but as few as possible, and ensuring not to match </div>
\K Resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match.
\d{1,9}\.\d{2} Match 1-9 digits, followed by a literal dot ., followed by exactly 2 digits

Please note that this approach will break if the
is not on the same line as the text. — Andy Lester, Dec 08 '17 at 19:49
@AndyLester [the `s` modifier can be used](https://regex101.com/r/AZd896/3). I added a note to my answer including this logic. — ctwheels, Dec 08 '17 at 19:51

score 1 · Answer 2 · answered Dec 08 '17 at 19:21

While Parsing HTML with regex sounds like fun, please use a proper XML parser instead. You can use the following DOMDocument code to achieve, this functionality:

<?php

$html = 'dasdfasdf 355.56 asdfasd
<div class="sdaf">sdfsad 36546545643.00 asdfa sdf sadfasdf 544.45 sadfs</div>';

$doc = new DOMDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div)
{
    $innerText = $div->nodeValue;
    $div->nodeValue = preg_replace('(\d{1,9}\.\d{2})', 'whatever', $innerText);
}

$html = $doc->saveHTML();

var_dump($html);

Now, with this you only have to parse with REGEX the inner string instead of the <div> itself.