1
var str = 'let us pretend that this is a blog about gardening&cooking; here's an apostrophe & ampersand just for fun.';

This is the string I'm operating on. The desired end result is: "let us pretend that this is a blog about gardening&cooking; here's an apostrophe & ampersand just for fun."

console.log('Before: ' + str);


str = str.replace(/&(?:#x?)?[0-9a-z]+;?/gi, function(m){
  var d = document.createElement('div');
  console.log(m);
  d.innerHTML = m.replace(/&/, '&');
  console.log(d.innerHTML + '|' + d.textContent);
  return !!d.textContent.match(m.replace(/&/, '&')[0]) ? m : d.textContent;
});


console.log('After: ' + str);
wwaawaw
  • 6,867
  • 9
  • 32
  • 42
  • You have `!!` at the beginning of your return. I don't believe that is valid syntax, and if it is, I think it cancels itself out. – Shmiddty Sep 24 '12 at 17:05
  • 3
    @Shmiddty `!!` is used to cast an operand to a Boolean. It's valid syntax and I dont think it is related to the issue. – jbabey Sep 24 '12 at 17:07
  • [this question](http://stackoverflow.com/questions/1219860/javascript-jquery-html-encoding) may have some answers for you. HTML encoding is one of those things where you should reuse a proven solution instead of trying to roll your own. – jbabey Sep 24 '12 at 17:09
  • Nope... the "inner" (second) one converts it to a boolean expression, (and, like you say, inverts it), the second one re-(un-)inverts it. That's the idea, to cancel it out without having cumbersome nested parentheses. – wwaawaw Sep 24 '12 at 17:10
  • I'm not sure what your point of matching the first character of that de-`&`ed string is, could you explain what you intend with that? – Bergi Sep 24 '12 at 17:49

2 Answers2

1

The problem is that HTML doesn't support XML's ' To avoid the issue you should use ' instead of '

For more information look at this post:

Why shouldn't ' be used to escape single quotes?

Community
  • 1
  • 1
Warlock
  • 7,321
  • 10
  • 55
  • 75
  • That definitely isn't the issue. The section containing ' is working just fine, exactly the way I want it to. What's not working is `&`. – wwaawaw Sep 24 '12 at 17:59
  • Okay, maybe you just need to use more simple function like this: str = str.replace(/&/gi, function(m){ console.log(m); return m.replace(/&/, '&'); }); – Warlock Sep 24 '12 at 18:25
0

This should do what you want:

str.replace(/&([#x]\d+;|[a-z]+;)/g, "&$1")

or, with a positive lookahead:

str.replace(/&(?=[#x]\d+;|[a-z]+;)/g, "&")

I don't think you need any HTML2text en-/decoding.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375