YouTube videos API snippet title contains special characters in Next.js

Question

I want to get the right video snippet title that doesn't include special characters. I am using the API:

https://www.googleapis.com/youtube/v3/search,

with the part snippet.

Currently, I am getting the snippet.title below:

I'M GONNA CARRY HER!!! Fortnite With Karina!

I expected this title instead:

I'm gonna carry her!!! Fortnite With Karina!

https://stackoverflow.com/questions/5796718/html-entity-decode — chandan_kr_jha, Mar 13 '20 at 10:40

score 4 · Answer 1 · answered Aug 07 '20 at 17:09

I'm using escape-goat as it operates as either a standalone function or as a tagged template literal, depending on your use case:

const {htmlUnescape} = require('escape-goat');

htmlUnescape("I&#39;M GONNA CARRY HER!!! Fortnite With Karina!");
//=> 'I'm gonna carry her!!! Fortnite With Karina!'

htmlUnescape`Title: ${"I&#39;M GONNA CARRY HER!!! Fortnite With Karina!"}`;
//=> 'Title: I'm gonna carry her!!! Fortnite With Karina!'

When dealing with html encode/decode, always be wary of potential XSS exploitation.

stvar · Accepted Answer · 2021-02-25T09:28:48.250

First, please acknowledge that what you've got from the API are not (quote from you) special characters.

To be technically precise, those sequence of characters are HTML character references, also known as HTML entities.

The behavior you've encountered is a well-known issue of the API, for which there's no other solution that I know of, except that you yourself have to substitute those HTML entities for the actual characters that they stand for.

Now, I recommend against an ad hoc solution; that is I do recommend you to employ well-written well-tested well-known libraries that derive their non-trivial solution from carefully implemented code conforming to the current HTML standard.

In my opinion, Mathias Bynens' library is evidently a tool that meets each of the criteria I mentioned above:

he

he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.

score 2 · Answer 3 · answered Feb 24 '21 at 15:07

If you want to use raw JS and not import a library, I saw something in my travels that works for the simple use case you presented. It basically is stripping out the separators to get at the integer that represents a Unicode-16 character. fromCharCode looks up that integer and returns the character that matches the integer you give it.

const unescape = (str) => {
  return str.replace(/&#(\d+);/g, (match, dec) => String.fromCharCode(dec))
}

As Matt Hosch mentioned in his answer, you'd want to sanitize any data you receive to prevent an XSS.

YouTube videos API snippet title contains special characters in Next.js

3 Answers3

he

Linked

Related