0

I want change numbers in my webpage, I do not want to break the HTML of the page. What is the right way?

I have read this answer: RegEx match open tags except XHTML self-contained tags

However there is a skype plugin that somehow replace numbers in webpage. How does it do that?

Here is my code:

var formats = '(xxx) xxx-xxxx|(xxx)xxx-xxxx|xxx-xxx-xxxx|xxx.xxx.xxxx|xxx xxx xxxx';
var str = '('+formats.replace(/([\(\)\+\-])/g, '\\$1').replace(/x/g,'\\d') + ')';

var r = RegExp(str,'g');
document.body.innerHTML=document.body.innerHTML.replace(r,'<a style="color:#07C !important; font-size:100% !important;" href="https://call.com/number=$1">$1</a>');

The issue I'm facing is that it mess with body tags attributes for example:

<a href="https://stackoverflow.com/a/4338544/1269037">validate phone numbers properly</a>

Is replaced with broken html:

<a href="https://stackoverflow.com/a/&lt;a style=" color:#07c="" !important;="" font-size:100%="" !important;"="">4338544/1269</a>

and code arround is all messed up.

I think the RegEx pattern is not well defined

Community
  • 1
  • 1

1 Answers1

0

Using regular expressions to parse and process HTML code is a near-impossible task. There are always boundary cases that will be missed.

A more sound method is to use the document object model and walk through all text nodes, and then process those texts in isolation. When there is a match, use again the DOM to add link element(s).

Here is a working snippet that does just that, making use of a treeWalker:

// Prepare search expression:
var formats = ['(xxx) xxx-xxxx',
               '(xxx)xxx-xxxx',
               'xxx-xxx-xxxx'];
var str = formats.join('|')         // split patterns by OR operator
    .replace(/[()+]/g, '\\$&')      // escape special characters
    .replace(/-/g, '[-. ]')         // hyphen can be space or dot as well
    .replace(/(^|[|])x/g, '$1\\bx') // require first digit to be start of a word
    .replace(/x($|[|])/g, 'x\\b$1') // require last digit to be end of a word
    .replace(/x/g, '\\d')           // set digit placeholders
;
var r = RegExp('(' + str + ')', '');                  
var node;
// create a walker for visiting all text nodes in the document
var walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT,
                                       null, false);
while (node = walker.nextNode()) {
    // Do not process SCRIPT, OPTION and some other tag contents
    // You might need to extend this black-list:
    if (node.parentNode.tagName.search(
            /SCRIPT|SELECT|OPTION|BUTTON|TEXTAREA/) === -1) {
        // split text of node into parts <non-phone><phone><non-phone>...
        var parts = node.nodeValue.split(r);
        while (parts.length > 1) {
            var txt = parts.shift();
            if (txt.length) {
                // insert a text node for the non-phone text:
                node.parentNode.insertBefore(document.createTextNode(txt), node);
            }
            // get phone number, create a link for it
            var phone = parts.shift();
            var a = document.createElement('a');
            // set hyperlink, and pass digits only as URL argument:
            a.setAttribute('href',
                           'https://call.com/number=' + phone.replace(/[^\d]/g, ''));
            a.setAttribute('style', 
                           'color:#07C !important; font-size:100% !important;');
            a.textContent = phone;
            // insert link into the document
            node.parentNode.insertBefore(a, node);
        }
        // reduce the original node to the ending non-phone part
        node.nodeValue = parts[0];
    };
}
This is a test. 
Following are valid:<br/>
<ul>
    <li>Please dial:473-299-8154</li>
    <li>or 678.269-1514, during weekends</li>
    <li>Private (732 939 8549)</li>
    <li>Back-up =(673) 137.4892</li>
</ul>
 Do not match any of these:<br/>
<ul>
    <li>a473-299-8154 because of a</li>
    <li>473-299-81549 because of last 9</li>
    <li>473/299.8154 because of slash</li>
</ul>
 Some elements whose content should not be parsed:
<form id="myform">
    <select id="sel">
        <option value="phone">123.456.7890</option>
    </select>
    <input  id="inp" type="text" value="123-321-1231">
    <button>123-321-1231</button><br/>
    <textarea>Links are not allowed in textareas:
123-321-1231</textarea>
</form>
trincot
  • 317,000
  • 35
  • 244
  • 286
  • Nice method ! I'm going to use your method, seems a lot more better. Thanks! –  Dec 31 '15 at 06:48
  • May we use `formats = '(xxx) xxx-xxxx|(xxx)xxx-xxxx|xxx-xxx-xxxx|xxx.xxx.xxxx|xxx xxx xxxx';` I just want to keep the feature to add new number formats. My RegEx is not good, your RegEx seems to be better, but how to generate RegEx using my number formats which I need to define in variable 'formats' ? I would like the final RegEx to look something similar like yours, but also have other formats which i will define later –  Dec 31 '15 at 06:52
  • Thank You ! Looking forward for that –  Dec 31 '15 at 10:25
  • Thank You very much ! You save me a lot of time, You are very smart and did it very nice, i would like to buy you a beer my friend if i can do it somehow –  Dec 31 '15 at 12:14
  • You're welcome. Once you have 15 points reputation, come back here and give me an up-vote -- equivalent of the beer :-). The reputation is easy to get... – trincot Dec 31 '15 at 12:21