How to get numbers in elements' inner text by javascript's regex

Question

I want to get numbers in the inner text of an html by javascript regex to replace them.
for example in the below code I want to get 1,2,3,4,5,6,1,2,3,1,2,3, but not the 444 inside of the div tag.

<body>
  aaaa123aaa456
  <div style="background: #444">aaaa123aaaa</div>
  aaaa123aaa
</body>

What could be the regular expression?

You can't (reliably): http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Will, Feb 19 '13 at 20:21
but regular expressions can find almost every sub string in a string. — Erfan, Feb 20 '13 at 11:54

score 4 · Answer 1 · edited May 23 '17 at 12:04

Your best bet is to use innerText or textContent to get at the text without the tags and then just use the regex /\d/g to get the numbers.

function digitsInText(rootDomNode) {
  var text = rootDomNode.textContent || rootDomNode.innerText;
  return text.match(/\d/g) || [];
}

For example,

alert(digitsInText(document.body));

If your HTML is not in the DOM, you can try to strip the tags yourself : JavaScript: How to strip HTML tags from string?

Since you need to do a replacement, I would still try to walk the DOM and operate on text nodes individually, but if that is out of the question, try

var HTML_TOKEN = /(?:[^<\d]|<(?!\/?[a-z]|!--))+|<!--[\s\S]*?-->|<\/?[a-z](?:[^">']|"[^"]*"|'[^']*')*>|(\d+)/gi;

function incrementAllNumbersInHtmlTextNodes(html) {
  return html.replace(HTML_TOKEN, function (all, digits) {
    if ("string" === typeof digits) {
      return "" + (+digits + 1);
    }
    return all; 
  });
}

then

incrementAllNumbersInHtmlTextNodes(
    '<b>123</b>Hello, World!<p>I <3 Ponies</p><div id=123>245</div>')

produces

    '<b>124</b>Hello, World!<p>I <4 Ponies</p><div id=123>246</div>'

It will get confused around where special elements like <script> end and won't recognize digits that are entity encoded, but should work otherwise.

Thank you for your answer, but i'm still wondering to find a regular expression which can find inner text of elements in a string. — Erfan, Feb 20 '13 at 12:00
Mike Samuel I need to replace the numbers, in your way i should use a function and call it over and over. but with regular expression which can parse all the html as a string I can replace numbers at once. — Erfan, Feb 21 '13 at 13:01

score 0 · Answer 2 · answered Feb 19 '13 at 20:31

You don't necessarily need RegExp to get the text contents of an element excluding its descendant elements' — in fact I'd advise against it as RegExp matching for HTML is notoriously difficult — there are DOM solutions:

function getImmediateText(element){
    var text = '';

    // Text and elements are all DOM nodes. We can grab the lot of immediate descendants and cycle through them.
    for(var i = 0, l = element.childNodes.length, node; i < l, node = element.childNodes[i]; ++i){
    // nodeType 3 is text
        if(node.nodeType === 3){
            text += node.nodeValue;
        }
    }

    return text;
}

var bodyText = getImmediateText(document.getElementsByTagName('body')[0]);

So here there's a function that will return only the immediate text content as a string. Of course, you could then strip that for numbers with the RegExp using something like this:

var numberString = bodyText.match(/\d+/g).join('');

score 0 · Accepted Answer · answered Aug 17 '21 at 20:54

Just to answer my old question:
It is possible to achieve it by lookahead.

/\d(?=[^<>]*(<|$))/g

to replace the numbers

    html.replace(/\d(?=[^<>]*(<|$))/g, function($0) {
        return map[$0]
    });

the source of the answer https://www.drupal.org/node/619198#comment-5710052

How to get numbers in elements' inner text by javascript's regex

3 Answers3