1

I'm writing a Firefox extension. I want to go through the entire plaintext, so not Javascript or image sources, and replace certain strings. I currently have this:

var text = document.documentElement.innerHTML;

var anyRemaining = true;
do {    
    var index = text.indexOf("search");
    if (index != -1) {
        // This does not just replace the string with something else, 
        // there's complicated processing going on here. I can't use 
        // string.replace().
    } else {
        anyRemaining = false;
    }
} while (anyRemaining);

This works, but it will also go through non-text elements and HTML such as Javascript, and I only want it to do the visible text. How can I do this?

I'm currently thinking of detecting an open bracket and continuing at the next closing bracket, but there might be better ways to do this.

foxite
  • 191
  • 10
  • [javascript replace text in the html body](http://stackoverflow.com/a/25699092/215552) seems to do what you want... – Heretic Monkey Nov 22 '16 at 18:07
  • Checkout this [texthighlight function](https://github.com/wet-boew/wet-boew/blob/master/src/plugins/texthighlight/texthighlight.js) and a [demo page](https://wet-boew.github.io/v4.0-ci/demos/texthighlight/texthighlight-en.html?txthl=avian%20influenza+world+cook+flu-like%20symptoms+Don%27t%20Forget...+causes%20sickness%20in%20birds,%20it%20can%20also%20infect%20people.) – thekodester Nov 22 '16 at 18:07
  • You can try element.textContent to get the text without the HTML instead of innerHTML – Yash Dayal Nov 22 '16 at 18:22
  • @YashDayal I tried that, but reassigning the textContent broke everything. – foxite Nov 22 '16 at 18:52

2 Answers2

2

You can use xpath to get all the text nodes on the page and then do your search/replace on those nodes:

function replace(search,replacement){
 var xpathResult = document.evaluate(
  "//*/text()", 
  document, 
  null, 
  XPathResult.ORDERED_NODE_ITERATOR_TYPE, 
  null
 );
 var results = [];
 // We store the result in an array because if the DOM mutates
 // during iteration, the iteration becomes invalid.
 while(res = xpathResult.iterateNext()) {
  results.push(res);
 }
 results.forEach(function(res){
  res.textContent = res.textContent.replace(search,replacement);
 })
}

replace(/Hello/g,'Goodbye');
<div class="Hello">Hello world!</div>
Kyle
  • 3,935
  • 2
  • 30
  • 44
  • 1
    This solution works. I only had to replace the line in results.forEach() with a call to my processing method. Thank you! – foxite Nov 23 '16 at 18:58
  • No problem. Something I failed to mention about this is that it's not supported in Internet Explorer. – Kyle Nov 23 '16 at 19:07
  • If IE is a problem, you can also use the TreeWalker implementation to get the text nodes shown here: http://stackoverflow.com/a/10730777/701263 – Kyle Nov 23 '16 at 19:09
0

You can either use regex to strip the HTML tags, might be easier to use javascript function to return the text without HTML. See this for more details: How can get the text of a div tag using only javascript (no jQuery)

Community
  • 1
  • 1
Yash Dayal
  • 1,164
  • 8
  • 7
  • I need to replace the text I find, so I need to be able to reassign the HTML contents. I could strip the HTML tags using regex, but that will pretty much break everything. – foxite Nov 22 '16 at 18:45