4

My text is something like:

<a href="http://example.com/test this now">Stuff</a>

More stuff

<a href="http://example.com/more?stuff goes here">more</a>

I want to replace what's inside the href with a function that will URL Encode just the URL portion.

How would I go about this?

UPDATE Here's what I've tried:

postdata.comment.content = postdata.comment.content.replace(/href=\"(.+?)\"/g, function(match, p1) {
    return encodeURI(p1);
});

Does not do what I would have hoped.

Expected result is:

<a href="http%3A%2F%2Fexample.com%2Ftest%20this%20now">Stuff</a>

More stuff

<a href="http%3A%2F%2Fexample.com%2Fmore%3Fstuff%20goes%20here">more</a>
Shamoon
  • 41,293
  • 91
  • 306
  • 570

5 Answers5

8

The regex is matching the complete attribute href="....", however, the replacement is only done by the encoded URL and use encodeURIComponent() to encode complete URL.

var string = '<a href="http://example.com/test this now">Stuff</a>';

string = string.replace(/href="(.*?)"/, function(m, $1) {
    return 'href="' + encodeURIComponent($1) + '"';
    //      ^^^^^^                     ^
});

var str = `<a href="http://example.com/test this now">Stuff</a>

More stuff

<a href="http://example.com/more?stuff goes here">more</a>`;

str = str.replace(/href="(.*?)"/g, (m, $1) => 'href="' + encodeURIComponent($1) + '"');

console.log(str);
document.body.textContent = str;
Tushar
  • 85,780
  • 21
  • 159
  • 179
6

For the encoding, you can use encodeURIComponent:

var links = document.querySelectorAll('a');
for(var i=0; i<links.length; ++i)
  links[i].href = encodeURIComponent(links[i].href);
<a href="http://example.com/test this now">Stuff</a>
More stuff
<a href="http://example.com/more?stuff goes here">more</a>

If you only have a HTML string instead of DOM elements, then use don't use regular expressions. Parse your string with a DOM parser instead.

var codeString = '<a href="http://example.com/test this now">Stuff</a>\nMore stuff\n<a href="http://example.com/more?stuff goes here">more</a>';
var doc = new DOMParser().parseFromString(codeString, "text/html");
var links = doc.querySelectorAll('a');
for(var i=0; i<links.length; ++i)
  links[i].href = encodeURIComponent(links[i].href);
document.querySelector('code').textContent = doc.body.innerHTML;
<pre><code></code></pre>

And note that if you encode the URL entirely, it will be treated as a relative URL.

Community
  • 1
  • 1
Oriol
  • 274,082
  • 63
  • 437
  • 513
  • Instead of `new DOMParser().parseFromString(codeString, "text/html");`, you could do `(function(){ this.innerHTML = html; [...] }).bind(document.createElement('div'))(html)` (which MAY be a bit faster) – Ismael Miguel May 02 '16 at 22:21
  • @IsmaelMiguel Yes, but only if the string is trusted. Try your approach with `''` – Oriol May 02 '16 at 22:24
  • Very nice point. 20 upvotes for you! I had no idea that Javascript would run when no element is added to the DOM. – Ismael Miguel May 02 '16 at 22:45
4

Where is this running? If you have a DOM, then you are MUCH better off using a DOM loop over document.links or document.querySelectorAll("a") than regex on HTML. Also you likely do not want to encode EVERYTHING, only the search part

var allLinks = document.querySelectorAll("a");
for (var i=0;i<allLinks.length;i++) {
   var search = allLinks[i].search;
   if (search) {
     allLinks[i].search="?"+search.substring(1).replace(/stuff/,encodeURIComponent("something"));
   }
}

In case you really DO want to have encoded hrefs then

for (var i=0;i<allLinks.length;i++) {
   var href = allLinks[i].href;
   if (href) {
     allLinks[i].href=href.replace(/stuff/,encodeURIComponent("something"));
   }
}
mplungjan
  • 169,008
  • 28
  • 173
  • 236
  • It's pretty clear from the question OP wants to encode the *whole* URL. – cat May 02 '16 at 22:05
  • Yes, `http%3A%2F%2Fexample.com%2Ftest%20this%20now` is probably worthless for most things but it's what OP wants. – cat May 03 '16 at 02:32
  • 1
    What he thinks he wants. Have a search for "what is the X/Y problem" anyway - updated to include href – mplungjan May 03 '16 at 03:58
4

Disclaimer: Don't use regex to parse HTML
(too many reasons to list here..)

But, if you insist, this might work.

Find /(<[\w:]+(?:[^>"']|"[^"]*"|'[^']*')*?\shref\s*=\s*)(?:(['"])([\S\s]*?)\2)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>)/

Replace $1$2 + someEncoding( $3 ) + $2$4

Expanded

 (                             # (1 start)
      < [\w:]+ 
      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
      \s 
      href \s* = \s* 
 )                             # (1 end)
 (?:
      ( ['"] )                      # (2)
      (                             # (3 start)
           [\S\s]*? 
      )                             # (3 end)
      \2 
 )
 (                             # (4 start)
      (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
      >
 )                             # (4 end)
  • 2
    "too many reasons to list here" *the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ* – cat May 03 '16 at 02:30
  • that's the only reason you need – cat May 03 '16 at 02:30
2

Your expected string "http%3A%2F%2Fexample.com%2Ftest%20this%20now" corresponds to this operation encodeURIComponent("http://example.com/test this now"), but not with encodeURI function:

var str = '<a href="http://example.com/test this now">Stuff</a>More stuff<a href="http://example.com/more?stuff goes here">more</a>';
str = str.replace(/href=\"(.+?)\"/g, function (m, p1) {
    return encodeURIComponent(p1);
});

console.log(str);
// <a http%3A%2F%2Fexample.com%2Ftest%20this%20now>Stuff</a>More stuff<a http%3A%2F%2Fexample.com%2Fmore%3Fstuff%20goes%20here>more</a>
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105