5

I'm looking to replace text in a webpage (any webpage I want to run it on) using JavaScript. I'm not an expert in JavaScript, so I am sort of lost. If I can help it I would like to avoid jQuery.

Through Google, I've found this stackoverflow question. But when I inject document.body.innerHTML = document.body.innerHTML.replace('hello', 'hi'); into a webpage it sort of messes the page up. It seems to make the page revert to basic text and formatting.

Also, I'm wondering if the regex code from here, could be used. Again, I really am not sure how to use it. What it would do is replace only webpage text - not links or filenames.

I'm using Google Chrome incase that matters.

Community
  • 1
  • 1
Numeri
  • 1,027
  • 4
  • 14
  • 32
  • 3
    If you perform a string replace on `innerHTML` it will regenerate all elements and lose event bindings. – zzzzBov Aug 27 '13 at 20:10
  • And that would cause the above-mentioned issues? – Numeri Aug 27 '13 at 20:13
  • What are you trying to accomplish? Changing 'hello' to 'hi'? or something else? – Ofir Israel Aug 27 '13 at 20:13
  • What do you mean by "any webpage I want to run it on"? Do you have access to the code of all of these sites? Or do you want to do the replacement on sites like http://cnn.com, too? – kol Aug 27 '13 at 20:16
  • Oh, that is just an example. I would really just like to be able to automatically screen certain text. :D – Numeri Aug 27 '13 at 20:17
  • Sorry everyone! Just now (after all that searching!) I saw a [related question](http://stackoverflow.com/questions/5797661/replace-text-in-website-with-chrome-content-script-extension?rq=1) in the Related section that answers my question perfectly! – Numeri Aug 27 '13 at 20:20
  • Note that my answer below will be safer than the answe rin the question you just linked to for both the reason @zzzzBov pointed out and the fact that you could modify HTML attributes and break the page rather than just modify the actual text on the page. – Paul Aug 27 '13 at 20:24

2 Answers2

13

You could perform your repleacements on all the just the text nodes in the DOM:

function replaceTextOnPage(from, to){
  getAllTextNodes().forEach(function(node){
    node.nodeValue = node.nodeValue.replace(new RegExp(quote(from), 'g'), to);
  });

  function getAllTextNodes(){
    var result = [];

    (function scanSubTree(node){
      if(node.childNodes.length) 
        for(var i = 0; i < node.childNodes.length; i++) 
          scanSubTree(node.childNodes[i]);
      else if(node.nodeType == Node.TEXT_NODE) 
        result.push(node);
    })(document);

    return result;
  }

  function quote(str){
    return (str+'').replace(/([.?*+^$[\]\\(){}|-])/g, "\\$1");
  }
}

Quote function borrowed from this answer.

Usage:

replaceTextOnPage('hello', 'hi');

Note that you will need to SHIM forEach in older browsers or replace that code with a loop like so:

var nodes = getAllTextNodes();
for(var i = 0; i < nodes.length; i++){
    nodes[i].nodeValue = nodes[i].nodeValue.replace(new RegExp(quote(from), 'g'), to);
}
Community
  • 1
  • 1
Paul
  • 139,544
  • 27
  • 275
  • 264
  • I've been trying to implement this in a Chrome Extension, but it only replaces some of the matches. Any more help? (My Chrome is the newest version.) – Numeri Aug 28 '13 at 20:14
  • @Numeri If the phrase occurs more than once in the same text node, you'll need to either keep calling replace until it does nothing or use a [RegExp](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp) with the global flag. I updated my answer to show how to do it using a RegExp. – Paul Aug 28 '13 at 20:19
  • Thanks! I tried regex, but I don't think I quite knew what I was doing (on the javascript side)! :D – Numeri Aug 28 '13 at 20:44
2

Recently, I had to exercise a similar problem, and I came up with something similar to this:

<!DOCTYPE html>
<html>
<head>
  <title>HTML JS REPLACE</title>
  <script type="text/javascript">
  function convert(elem) {
    var content = document.getElementById(elem).innerHTML; // get HTML content for the given element
    var pattern = new RegExp(/hello/gi);
    content = content.replace(pattern,'hi');
    document.getElementById(elem).innerHTML = content; // put the replace content back
  }
  </script>
</head>
<body>
  <div id="content">
    Some text that includes both hello and hi. And a hello.
  </div>
  <script type="text/javascript">
    window.onload = convert('content');
  </script>
</body>
</html>

The result will be that you will get a page saying this:

Some text that includes both hi and hi. And a hi.

while the original source still says:

Some text that includes both hello and hi. And a hello.

The tricky bits are really just a few - first, you want the window.onload trigger to be included at the bottom of body, so the DOM loads fully before running any JS on it. Second, you must have a top-level block element that you assign a unique ID which you can reference from JS. Third, the convert function uses a regular expression, which executes a global case-insensitive replace of the string "hello" by changing it to "hi".

Your specific application may require to instead capture all of the occurences and then parse them in a loop, which may (or may not) cause some performance issues. Be careful :)

eksperts
  • 149
  • 4
  • You may be interested in trying out the solution in my answer as well. It is safer in the event that the HTML contains the string you are trying to replace (consider trying to replace all `<` with `>`). It also keeps all Javascript events in place and doesn't need to recreate every DOM element on the page. – Paul Aug 27 '13 at 20:36
  • Thank you so much for the help eksperts! Sorry I can only accept one answer :) – Numeri Aug 27 '13 at 20:40
  • Thanks Paulpro, I will absolutely look into it. My specific application (which I haven't included because it's far too complicated for this case), however, had to match currency with units, which may or may not be enclosed with almost any tags, plus perform mathematical operations based on the numeric value extracted. – eksperts Aug 27 '13 at 20:47