1

I'm using node.js to pull in html from an API, I'm storing it in a variable before I display it. I need to replace a link in that html string but I'm only able to use the front part of the link to search, as they will be dynamic.

I found an example that would work great using document.querySelectorAll("a[href^='http://somelink.com/12345678']")

Javascript getElement by href?

But I'm not using the DOM.

Dynamic Links that need to be removed/replaced:

<a href="http://somelink.com/12345678-asldkfj>Click Here</a>
<a href="http://somelink.com/12345678-clbjj>Click Here</a>
<a href="http://somelink.com/12345678-2lksjd>Click Here</a>

What I can search on:

<a href="http://somelink.com/12345678

I need to either change the actual link name "Click Here" or remove the element.

Any ideas how to achieve this with plain JS? Initially I'm thinking maybe there is a way to create a fake/temp DOM?

EDIT: Modifying the answer below with my code, it did exactly what I needed.

var str = '<a href="http://somelink.com/12345678-asldkfj">Click Here</a><a href="http://somelink.com/12345678-clbjj">Click Here</a><a href="http://somelink.com/12345678-2lksjd">Click Here</a>';
var div = document.createElement("div");
div.innerHTML = str;

var links = div.querySelectorAll("a[href^='http://somelink.com/12345678']");

for(i=0; i<links.length; i++) {
    if(links[i]) {
        str = str.replace(links[i].outerHTML, 'New Name');
  }
}

console.log(str);
Keith
  • 1,969
  • 4
  • 17
  • 27

3 Answers3

3

You don't get nothing because your links href attribute are not correctly ended, there's a missing " at the end, if you fix it, everything will be fine.

Otherwise if you are'nt using HTML and DOM, you can append your HTML string in a temporary DOM element like this:

var str = '  <a href="http://somelink.com/12345678-asldkfj">Click Here</a>'
+'<a href="http://somelink.com/12345678-clbjj">Click Here</a>'
 + '<a href="http://somelink.com/12345678-2lksjd">Click Here</a>';
 var div = document.createElement("div");
 div.innerHTML = str;

var links = div.querySelectorAll("a[href^='http://somelink.com/12345678']");
console.log(links);

Note:

To use this code in a nodejs environment you will need to use a DOM parser module, these are some modules that can help you:

cнŝdk
  • 31,391
  • 7
  • 56
  • 78
  • Thanks, this will work, I can then do a str = str.replace(links[i].outerHTML, ''); to clear the link out completely. Others are saying to not use the temp DOM element though. – Keith Sep 22 '17 at 22:06
  • @user3712837 in that case you will need `str = str.replace(links, '');`, it will remove all the links together, or you can just loop over links and remove them separately. – cнŝdk Sep 22 '17 at 22:44
  • One issue though is i'm using node and there is no document. – Keith Sep 22 '17 at 23:40
  • Well ffor that, you will need to use a DOM module in nodejs ;) – cнŝdk Sep 22 '17 at 23:59
  • Haha thanks, I just put that in and it looks like it's working. Using JSDOM from the answer below. – Keith Sep 23 '17 at 00:12
1

A fake dom would be serious over-kill here. All you need is a string replace. If you know for sure that your strings are safe, then this example should be sufficient.

Edit: Added parsing an html string to generate the array of links to work on and added replacement of innerText.

To get an array of links from an html string:

  • Match <a, followed by 0 or more not >, followed by >, followed by shortest possible string to rest of the match, followed by</a>

  • This pattern includes capture groups for the opening/closing tags because then we can reuse the same pattern to replace the anchor's innerText later.

To replace the href of each link:

  • Match href=", followed by 1 or more not ", followed by "
  • Replacing the full match with href=", followed by new url, followed by ".

To replace innerText of anchor:

  • Match (<a, followed by 0 or more not >, followed by >), followed by shortest possible string to rest of the match, followed by(</a>), capturing the opening tag in $1 and closing tag in $3.
  • Replace string with opening tag, followed by new text, followed by closing tag.

const linksHtml = document.querySelector('#links').innerHTML

// Note that capture group 2 will not actually capture "shortest string" even 
// though it matches. $2 in a replace() would return huge useless string.

const anchorPattern =/(<a[^>]*>)(.*?)(<\/a>)/g

const links = linksHtml.match(anchorPattern)

const newUrls = [
  'http://someotherlink.com/cool',
  'http://someotherlink.com/happy',
  'http://someotherlink.com/smile'
]

const newText = [
  'Cool',
  'Happy',
  'Smile',
]

const replaced = links
  // replace urls
  .map( (link, i) =>
    link.replace(/(href=")[^"]+"/, `$1${newUrls[i]}"`)
  )
  // replace innerText
  .map( (link, i) =>
    link.replace(anchorPattern, `$1${newText[i]}$3`)
  )

document.querySelector('pre')
  .innerText = JSON.stringify(replaced,null,2)
<div id="links">
  <h2>Probably will be a header.</h2>
  <a href="http://somelink.com/12345678-asldkfj">Click Here</a>
  <p>And maybe some random text.</p>
  <a href="http://somelink.com/12345678-clbjj">Click Here</a>
  <p>One of the links might be in a paragraph. <a href="http://somelink.com/12345678-2lksjd">Click Here</a></p>
</div>

<h2>Result: </h2>
<pre/>
skylize
  • 1,401
  • 1
  • 9
  • 21
  • This is really helpful, any way to just change the 'Click Here' part of that link or remove it completely? The links array you have would be just one big html string though, I don't have them pulled out like that. – Keith Sep 22 '17 at 20:02
  • Updated the answer. It's _all_ in there. – skylize Sep 23 '17 at 15:15
  • Thanks for reworking that again. It seems like it's replacing all links, I just need certain cases. @chsdk's did work, just using the jsDOM but might be over-kill like you suggested. – Keith Sep 25 '17 at 22:48
  • Yes, it's replacing all links because I coded it to replace all links. I don't know _exactly_ what you need, so I wrote a general use example to give you the techniques needed. Study what's actually happening here and then tweak it to your use-case. You can either start by only parsing out the links you want to work on, instead of all links, or you can use [`Array.prototype.filter()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/filter) to split out the ones you actually want to change. – skylize Sep 27 '17 at 12:06
1

You could use string searching or regular expressions (but shouldn't unless extremely simple html) to try to manipulate your html string. But you can, and would be easier to, import packages that create DOM parsing / manipulation methods, like Cheerio (jQuery like) or jsDOM.

From there you would parse the string into a DOM document, do the querying and replacing the text or removing the elements through their methods.

jsDOM Example:

const JSDOM = require("jsdom");
const dom = new JSDOM(yourHtmlString);
const document = dom.window.document;

var elements = document.querySelectorAll("a[href^='http://somelink.com/12345678']");

for(let i=0; i<elements.length; i++){
  elements[i].textContent = "Replacement text";
  //element.remove() if removing
}

var resultHtml = dom.serialize();

Cheerio Example:

var cheerio = require('cheerio');
$ = cheerio.load(yourHtmlString);

$("a[href^='http://somelink.com/12345678']").text('Text to Replace "Click Here"');
//or .remove() if wanting to remove

var htmlResult = $.html();
Patrick Evans
  • 41,991
  • 6
  • 74
  • 87