I'm currently writing a very basic program that'll firstly go through the html code of a website to find all RSS Links, and thereafter put the RSS Links into an array and parse each content of the links into an existing XML file.
However, I'm still learning C# and I'm not that familiar with all the classes yet. I have done all this in PHP by writing own class with get_file_contents() and as well been using cURL to do the work. I managed to get around it with Java also. Anyhow, I'm trying to accomplish the same results by using C#, but I think I'm doing something wrong here.
TLDR; What's the best way to write the regex to find all RSS links on a website?
So far, my code looks like this:
private List<string> getRSSLinks(string websiteUrl)
{
List<string> links = new List<string>();
MatchCollection collection = Regex.Matches(websiteUrl, @"(<link.*?>.*?</link>)", RegexOptions.Singleline);
foreach (Match singleMatch in collection)
{
string text = singleMatch.Groups[1].Value;
Match matchRSSLink = Regex.Match(text, @"type=\""(application/rss+xml)\""", RegexOptions.Singleline);
if (matchRSSLink.Success)
{
links.Add(text);
}
}
return links;
}