I'm trying to develop a script that will take user submitted HTML, loop through it to identify matching tags, make adjustments to those matched tags, and then spit out the resulting HTML as plain text that the user can copy. The end goal here is to replace all href
's in a submission and replace them with different URL's.
So for example, this:
<a href="http://example.com/">Link A</a>
<a data-track="false" href="http://example.com/">Link B</a>
<a href="http://example.com/">Link C</a>
Becomes this:
<a href="http://newurl.com/link1">Link A</a>
<a data-track="false" href="http://example.com/">Link B</a>
<a href="http://newurl.com/link2">Link C</a>
My first thought was to take the submitted HTML from the <textarea>
field and put it in a variable. At this point the HTML becomes a string and I was going to loop through it with a regex to find matching tags. My issue was that I needed to find all <a>
tags that did NOT include the attribute data-track="false"
. And as far as I can tell that's impossible with regex since each link isn't going to be on its own line.
My second thought was to loop through it using jQuery where I could use something like this:
$("a:not([data-tracking='false'])");
But I can't use jQuery like this on a string, right? It needs to be in the DOM.
I'm unsure of the best way to go about doing this. Maybe another language would prove helpful, but other than HTML and CSS, javascript and jQuery are the only ones I'm experienced with.
Any and all help would be greatly appreciated.