-2

I'm running a JavaScript which replaces certain words in my browser's text content.

However I do not wish for it to replace the words within url's.

UPDATE:

E.g., if I've replaced X with Y, and I search for X within a search engine, any url links with X in it are replaced with Y - I can't click on them as they don't exist (and/or they are incorrect).

document.body.innerHTML = document.body.innerHTML.replace(/word/gi, "newword");

How can I do this?

Legionar
  • 7,472
  • 2
  • 41
  • 70
Natalia Sharon
  • 216
  • 2
  • 17
  • What do you mean by "words within urls"? Also what do you mean by "they are broken" ? Can you edit your question to be more specific? – nem035 Jul 06 '16 at 13:48
  • Well than you can not use the whole innerHTML of the document, you will need to go element by element and replace the text. Plus your goal will fail when the word has mark up in it. `word` will fail. – epascarello Jul 06 '16 at 13:49
  • *replaces certain words in my browser* - you can't replace words in browser. It may replace words in text content. Show your current content and expected result – RomanPerekhrest Jul 06 '16 at 13:49
  • 1
    replace "resign" with "fed to the lions" - but if a url has the word "resign" in it, it doesn't work. – Natalia Sharon Jul 06 '16 at 13:52
  • How would you like to detect, if its a url? URL can be f.e. `google.com`, and that will be really broad to match all types of URL. – Legionar Jul 06 '16 at 14:03

2 Answers2

0

It's really hard to do this (I mean, its too broad), but I suggest you to do that in these few steps:

  1. first you should match all urls and store them in some array (e.g. var urls = [];)
  2. also replace then all urls with some unique characters sequence, which are not for sure in your browser's content (e.g. ~~~~~)
  3. then do your clasical replace, like document.body.innerHTML = document.body.innerHTML.replace(/word/gi, "newword");
  4. and finally match in your new replaced browser's content all yours specials characters sequence (~~~~~) and replace them back in the same order with urls stored in your array (urls).

Matching URLs:

About matching urls you need a good regex that matches urls. This is hard to do. See here, here and here:

...almost anything is a valid URL. There are some punctuation rules for splitting it up. Absent any punctuation, you still have a valid URL.

Check the RFC carefully and see if you can construct an "invalid" URL. The rules are very flexible.

For example ::::: is a valid URL. The path is ":::::". A pretty stupid filename, but a valid filename.

Also, ///// is a valid URL. The netloc ("hostname") is "". The path is "///". Again, stupid. Also valid. This URL normalizes to "///" which is the equivalent.

Something like "bad://///worse/////" is perfectly valid. Dumb but valid.

Anyway, this answer is not meant to give you the best regex but rather a proof of how to do the string wrapping inside the text, with JavaScript.

OK so lets just use this one: /(https?:\/\/[^\s]+)/g

Again, this is a bad regex. It will have many false positives. However it's good enough for this example.

function urlify(text) {
    var urlRegex = /(https?:\/\/[^\s]+)/g;
    return text.replace(urlRegex, function(url) {
        return '<a href="' + url + '">' + url + '</a>';
    })
    // or alternatively
    // return text.replace(urlRegex, '<a href="$1">$1</a>')
}

var text = "Find me at http://www.example.com and also at http://stackoverflow.com";
var html = urlify(text);

// html now looks like:
// "Find me at <a href="http://www.example.com">http://www.example.com</a> and also at <a href="http://stackoverflow.com">http://stackoverflow.com</a>"

So in sum try:

$$('#pad dl dd').each(function(element) {
    element.innerHTML = urlify(element.innerHTML);
});

I hope that it will do at least a little help for you.

Community
  • 1
  • 1
Legionar
  • 7,472
  • 2
  • 41
  • 70
0

Here is a simple solution:
1. Replace all "word"s in urls with a "tempuniqueflag" (Note that word is not a substring of tempuniqueflag)

var urls = document.querySelectorAll('a');
for (url in urls) {
  if (typeof urls[url].href === "string") 
    urls[url].href = urls[url].href.replace(/word/,"tempuniqueflag");
}
  1. Replace your text content as usual
    document.body.innerHTML = document.body.innerHTML.replace(/word/gi, "newword");

  2. Bring back the original word in the urls
    for (url in urls) { if (typeof urls[url].href === "string") urls[url].href = urls[url].href.replace(/tempuniqueflag/,"word"); }