Search the HTML document's text for certain strings (and replace those)

Question

I'm writing a Firefox extension. I want to go through the entire plaintext, so not Javascript or image sources, and replace certain strings. I currently have this:

var text = document.documentElement.innerHTML;

var anyRemaining = true;
do {    
    var index = text.indexOf("search");
    if (index != -1) {
        // This does not just replace the string with something else, 
        // there's complicated processing going on here. I can't use 
        // string.replace().
    } else {
        anyRemaining = false;
    }
} while (anyRemaining);

This works, but it will also go through non-text elements and HTML such as Javascript, and I only want it to do the visible text. How can I do this?

I'm currently thinking of detecting an open bracket and continuing at the next closing bracket, but there might be better ways to do this.

[javascript replace text in the html body](http://stackoverflow.com/a/25699092/215552) seems to do what you want... — Heretic Monkey, Nov 22 '16 at 18:07
Checkout this [texthighlight function](https://github.com/wet-boew/wet-boew/blob/master/src/plugins/texthighlight/texthighlight.js) and a [demo page](https://wet-boew.github.io/v4.0-ci/demos/texthighlight/texthighlight-en.html?txthl=avian%20influenza+world+cook+flu-like%20symptoms+Don%27t%20Forget...+causes%20sickness%20in%20birds,%20it%20can%20also%20infect%20people.) — thekodester, Nov 22 '16 at 18:07
You can try element.textContent to get the text without the HTML instead of innerHTML — Yash Dayal, Nov 22 '16 at 18:22
@YashDayal I tried that, but reassigning the textContent broke everything. — foxite, Nov 22 '16 at 18:52

score 2 · Accepted Answer · answered Nov 22 '16 at 19:03

2

You can use xpath to get all the text nodes on the page and then do your search/replace on those nodes:

function replace(search,replacement){
 var xpathResult = document.evaluate(
  "//*/text()", 
  document, 
  null, 
  XPathResult.ORDERED_NODE_ITERATOR_TYPE, 
  null
 );
 var results = [];
 // We store the result in an array because if the DOM mutates
 // during iteration, the iteration becomes invalid.
 while(res = xpathResult.iterateNext()) {
  results.push(res);
 }
 results.forEach(function(res){
  res.textContent = res.textContent.replace(search,replacement);
 })
}

replace(/Hello/g,'Goodbye');

<div class="Hello">Hello world!</div>

answered Nov 22 '16 at 19:03

Kyle

3,935
2
30
44

1

This solution works. I only had to replace the line in results.forEach() with a call to my processing method. Thank you! – foxite Nov 23 '16 at 18:58
No problem. Something I failed to mention about this is that it's not supported in Internet Explorer. – Kyle Nov 23 '16 at 19:07
If IE is a problem, you can also use the TreeWalker implementation to get the text nodes shown here: http://stackoverflow.com/a/10730777/701263 – Kyle Nov 23 '16 at 19:09

score 0 · Answer 2 · edited May 23 '17 at 11:59

0

You can either use regex to strip the HTML tags, might be easier to use javascript function to return the text without HTML. See this for more details: How can get the text of a div tag using only javascript (no jQuery)

edited May 23 '17 at 11:59

Community

1
1

answered Nov 22 '16 at 18:39

Yash Dayal

1,164
8
7

I need to replace the text I find, so I need to be able to reassign the HTML contents. I could strip the HTML tags using regex, but that will pretty much break everything. – foxite Nov 22 '16 at 18:45

Search the HTML document's text for certain strings (and replace those)

2 Answers2