7

I'm writing web pages in markdown and converting them to HTML using md2html tool. I want to process the output HTML file and find any youtube link like this:

<a href="https://www.youtube.com/watch?v=abcdefgh887">https://www.youtube.com/watch?v=abcdefgh887</a>

and replace it with the embed code:

<iframe width="560" height="315" src="https://www.youtube.com/embed/abcdefgh887?controls=0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

I toyed around a little with Grammars, mostly to get familiar with them, but concluded this probably isn't the ideal tool for the job. Plus I'd prefer to use existing modules that are easily adaptable to other similar tasks rather than roll my own half-baked solution.

Perl5 has some good tools for this kind of thing but I'd like to use a pure Raku solution so I can learn more Raku.

Any recommendations for good approaches to this problem?

Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
StevieD
  • 6,925
  • 2
  • 25
  • 45
  • cf [my answer](https://stackoverflow.com/a/45181464/1077672) to what strikes me as a similar question to yours. – raiph Jan 29 '22 at 01:58

2 Answers2

8

OK, found exactly what I needed for this job with DOM::Tiny.

Use it something like this:

    my $dom = DOM::Tiny.parse($html);
    for $dom.find('a[href]') -> $e  {
        if $e.text ~~ /^https\:\/\/www\.youtube\.com/ {
            my $embed = get_youtube_embed($e.text);
            $e.replace($embed)
        }
    }
raiph
  • 31,607
  • 3
  • 62
  • 111
StevieD
  • 6,925
  • 2
  • 25
  • 45
4

I tried to answer your question without knowing an example.

You need to extract youtubeId from A tag and then replace A tag into iframe tag.

pseudo code is:

for each line:
   if is youtube A tag,
       youtube A tag = youtube Iframe tag

Please paste your input file into my input variable.

const input = `
text1
<a href="https://www.youtube.com/watch?v=youtubeId1">https://www.youtube.com/watch?v=youtubeId1</a>
text2
text3
<a href="https://www.youtube.com/watch?v=youtubeId2">https://www.youtube.com/watch?v=youtubeId2</a>
`;

const rx = /^.*(?:(?:youtu\.be\/|v\/|vi\/|u\/\w\/|embed\/)|(?:(?:watch)?\?v(?:i)?=|\&v(?:i)?=))([^#\&\?<]*).*/;

const getYoutubeIframe = (youtubeId) => {
   return `<iframe width="560" height="315" src="https://www.youtube.com/embed/${youtubeId}?controls=0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>`
}

const output = input.split('\n').map(line => {
   const youtubeLink = '<a href="https://www.youtube.com/watch?v=';
   if (line.trim().indexOf(youtubeLink) === 0) {
      const youtubeId = line.match(rx)[1];
      return getYoutubeIframe(youtubeId);
   }
   return line;
}).join('\n');

console.log(output);
Alon Shmiel
  • 6,753
  • 23
  • 90
  • 138
  • Thanks but I was trying to avoid a one-off solution in preference for one that is easily adaptable to similar problems like embedding tweets on a page. I'm looking for module I might cobble together to make this easy without resorting to regexing the html. Something that parses and then lets me manipulate the DOM. – StevieD Jan 28 '22 at 20:55