2

I want to get the right video snippet title that doesn't include special characters. I am using the API:

https://www.googleapis.com/youtube/v3/search,

with the part snippet.

Currently, I am getting the snippet.title below:

I'M GONNA CARRY HER!!! Fortnite With Karina!

I expected this title instead:

I'm gonna carry her!!! Fortnite With Karina!

stvar
  • 6,551
  • 2
  • 13
  • 28
James Lin
  • 162
  • 3
  • 18

3 Answers3

4

I'm using escape-goat as it operates as either a standalone function or as a tagged template literal, depending on your use case:

const {htmlUnescape} = require('escape-goat');

htmlUnescape("I'M GONNA CARRY HER!!! Fortnite With Karina!");
//=> 'I'm gonna carry her!!! Fortnite With Karina!'

htmlUnescape`Title: ${"I'M GONNA CARRY HER!!! Fortnite With Karina!"}`;
//=> 'Title: I'm gonna carry her!!! Fortnite With Karina!'

When dealing with html encode/decode, always be wary of potential XSS exploitation.

Matt Hosch
  • 41
  • 3
4

First, please acknowledge that what you've got from the API are not (quote from you) special characters.

To be technically precise, those sequence of characters are HTML character references, also known as HTML entities.

The behavior you've encountered is a well-known issue of the API, for which there's no other solution that I know of, except that you yourself have to substitute those HTML entities for the actual characters that they stand for.

Now, I recommend against an ad hoc solution; that is I do recommend you to employ well-written well-tested well-known libraries that derive their non-trivial solution from carefully implemented code conforming to the current HTML standard.

In my opinion, Mathias Bynens' library is evidently a tool that meets each of the criteria I mentioned above:

he Build status Code coverage status

he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.

stvar
  • 6,551
  • 2
  • 13
  • 28
2

If you want to use raw JS and not import a library, I saw something in my travels that works for the simple use case you presented. It basically is stripping out the separators to get at the integer that represents a Unicode-16 character. fromCharCode looks up that integer and returns the character that matches the integer you give it.

const unescape = (str) => {
  return str.replace(/&#(\d+);/g, (match, dec) => String.fromCharCode(dec))
}

As Matt Hosch mentioned in his answer, you'd want to sanitize any data you receive to prevent an XSS.