So I have html files. I need to extract all the links and images from them. So basically I need:
<a href="this_is_what_I_need">
and <img src="this_is_also_needed">
I read the files line-by-line and can get it, but only the first one:
List<string> links = new List<string>();
if (line.Contains(@"<a href=""") || line.Contains(@"<img src="""))
{
if (line.Contains(@"<a href=""")
{
links.Add(line.Split(new string[] { @"<a href""" }, StringSplitOptions.None)[1].Split('"')[0]);
}
else
{
links.Add(line.Split(new string[] { @"<a href=""" }, StringSplitOptions.None)[1].Split('"')[0]);
}
}
But a line might contain multiple links and/or images. So how to get all?