0

I'm trying to find all the youtube video ID's from a playlist source. But i'm not too familiar with Regex so it's quite difficult for me.

This is my current code:

Console.Write("Playlist? Ex: \"PLaJlh8L9CwotfVy6fAtlphD_JD6IgSTMx\": ");
        string playlist = Console.ReadLine();
        string source = client.DownloadString("http://www.youtube.com/playlist?list=" + playlist);

        Regex reg = new Regex(".*?href=\"/watch\\?v=(?<vid>.+?)&amp;list="+ playlist);
        MatchCollection mc1 = reg.Matches(source);
        foreach (Match m in mc1)
        {
            string vid = m.Groups["vid"].Value;
            Console.WriteLine(m);
            Console.ReadLine();
        }

I want it to loop through the source and display every video ID it finds in the source. Example of a video ID is "EzuvVs953Gs" in "https://www.youtube.com/watch?v=EzuvVs953Gs"

So far it does display everything that contains a video ID but it displays the entire line. I want it to only display the ID. I also want it to be able to check if it already found the ID. If it did, it will skip to the new one so it doesn't display it more than one time.

dg-
  • 95
  • 1
  • 1
  • 14

1 Answers1

0

You are writing whole match object. Instead of Console.WriteLine(m); use Console.WriteLine(vid);.

vid is a value of group with video id.

NOTE: Regex is not best way for parsing html. I suggest you to use some library like HtmlAgilityPack.

Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • I didn't even notice that. What an easy fix. Thanks. – dg- Mar 14 '16 at 23:13
  • Why all professional programmers suggest *"don't use regex for parsing html"*? What's wrong with it? I like regex and always use it and all fine .. Is there any specific reason? – Shafizadeh Mar 14 '16 at 23:25
  • 1
    Parsing a whole document with regex is slower than something designed to parse html. You can get false positives if it matches something inside a comment tag or inside attributes. – joelnet Mar 14 '16 at 23:33
  • @Shafizadeh [question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) and [coding horror](http://blog.codinghorror.com/parsing-html-the-cthulhu-way/) – Sergey Berezovskiy Mar 14 '16 at 23:39