How to use regex to complete an element in HTML in C# properly?

Question

I used WebClient in C# to get an html doc of a Youtube video. Now I'm trying to get a Youtube comment out of the doc, but it's not working because different comments that use the same element (yt-formatted-string) have different attributes(class, id,span, and so on). So I'm trying to get regex to complete them for me and just get to the end tag (>).

Tried to use "." in regex, kind of like using the re module in python: re.compile(r('.')) in python, where it takes spaces,symbol, and characters and just completes them for me. Not sure if that even exists in C#, but I hope so.

        WebClient web = new WebClient();
        String content = web.DownloadString(@"https://www.youtube.com/watch?v=hE73JvEc2pQ");

        MatchCollection matches = Regex.Matches(content, @"<yt-formatted-string\.>\s*(.+?)\s*</yt-formatted-string>", RegexOptions.Multiline);
        foreach (Match match in matches)
        {
            textComment.Text = $"\n{match.Groups[1].Value}";
        }

Got nothing.

Want the Regex to complete attributes for me, like so:

Html line:

yt-formatted-string id="content-text" slot="content" split-lines="" class="style-scope ytd-comment-renderer">

Imaginary c sharp code that allows me to complete attributes:

"yt-formatted-string(complete all the attributes here)>\s*(.+?)\s*</yt-formatted-string>"

For some reason the hrml elements (yt-formatted-string) keep getting deleted!! — Nizar K, Feb 01 '19 at 17:22
I'm not sure what you mean by "complete an element", but I would recommend something like XPath rather than regex to extract data from HTML. Check this out: https://learn.microsoft.com/en-us/dotnet/standard/data/xml/select-nodes-using-xpath-navigation — IPValverde, Feb 01 '19 at 17:25

score 1 · Answer 1 · answered Feb 01 '19 at 17:30

1

you don't need to deal with such a complicated parsing. Just use Youtube Data API

Check This API

answered Feb 01 '19 at 17:30

Derviş Kayımbaşıoğlu

28,492
4
50
72

score 0 · Answer 2 · answered Feb 01 '19 at 17:59

0

For cases where an API is not available, you should also avoid trying to parse html with a regex, and instead parse it as XML. See https://stackoverflow.com/a/1732454/6055952 for more information.

answered Feb 01 '19 at 17:59

Matthew Varga

405
4
14

How to use regex to complete an element in HTML in C# properly?

2 Answers2