HTML searcher only returns 1 link

Question

So I have a method that returns every html download link in a separate html file I have in my folder. unfortunately it only returns 1 of the few I have.

Here is the method

private string GetHTMLDownloadLinks(string url, char SplitChhar, string serach, int index)
{

    //Initiates a new instance of WEbClient class
    WebClient WC = new WebClient();

    try
    {

        //Initiates a new stream instance with a url
        Stream stream = WC.OpenRead(url);


        //Initiates a streamreader to read the url parsed 
        StreamReader reader = new StreamReader(stream);

        string line;


        //Loops through the specifed url's html source 
        //and read every line
        while ((line = reader.ReadLine()) != null)
        {

            //If it finds the specified character that the user passed
            if (line.IndexOf(serach) != -1)
            {
                //it adds it to the parts variable 
                string[] parts = line.Split(SplitChhar);

                //Returns the index of the found 
                return parts[index];

            }
        }
    }
    catch (Exception Ex)
    {


        MessageBox.Show($"There seems to be a problem: {Ex}", "I am an error", MessageBoxButton.OKCancel, MessageBoxImage.Error);

    }

    return "" + "\n";

}

I suspect the error is in the loop because it only loops until it find the first one and doesn't continue

This is how I (invoke?) start the method

TxtBox_WebInfo.Text += GetHTMLDownloadLinks(@"Link to the HTML file", '"', "download", 1);

Edit Here is the body of the HTML its the only place that has any kind of link

    <body>

    <div>

        <h1>Download indexer for *app name*</h1>

        <img src="https://img2.cgtrader.com/items/56921/04705862f7/white-teapot-3d-model-obj-blend-mtl.png" />


        <a href="https://dl0.png" download>Download this image</a>

        <a href="https://dl1.png" download>asdg</a>

        <a href="https://dl2.png" download>asgsdg</a>

    </div>
</body>

It only returns the first one.

Tell me if anything is missing

You directly return after you found your first link. Thats the problem. — Tobias Theel, Aug 31 '17 at 10:43

Tobias Theel · Answer 1 · 2017-08-31T11:22:27.443

Edit:

You should/could make use of XPath to gather the lines content of the elements that contain the download links. After you gathered that you could use a regex to parse the content of these elements to find the download links.

You want to save all occurences of download links and return a collection, instead of just returning the first one you found.

private ICollection<string> GetHTMLDownloadLinks(string url, char SplitChhar, string serach, int index) {

  //Initiates a new instance of WEbClient class
  WebClient WC = new WebClient();
  ICollection<string> result = new List<string>();

  try {

    //Initiates a new stream instance with a url
    Stream stream = WC.OpenRead(url);

    //Initiates a streamreader to read the url parsed 
    StreamReader reader = new StreamReader(stream);

    string line;

    //Loops through the specifed url's html source 
    //and read every line
    while ((line = reader.ReadLine()) != null) {

      //If it finds the specified character that the user passed
      if (line.IndexOf(serach) != -1) {
        //it adds it to the parts variable 
        string[] parts = line.Split(SplitChhar);

        //Returns the index of the found 
        result.Add(parts[index]);

      }
    }
  } catch (Exception Ex) {


    //MessageBox.Show($"There seems to be a problem: {Ex}", "I am an error", MessageBoxButton.OKCancel, MessageBoxImage.Error);

  }

  return result;

}

Alternative: yield return The following solution makes use of yield return.

private IEnumerable<string> GetHTMLDownloadLinks(string url, char SplitChhar, string serach, int index) {

  //Initiates a new instance of WEbClient class
  WebClient WC = new WebClient();

  //Initiates a new stream instance with a url
  Stream stream = WC.OpenRead(url);


  //Initiates a streamreader to read the url parsed 
  StreamReader reader = new StreamReader(stream);

  string line;

  //Loops through the specifed url's html source 
  //and read every line
  while ((line = reader.ReadLine()) != null) {

    //If it finds the specified character that the user passed
    if (line.IndexOf(serach) != -1) {
      //it adds it to the parts variable 
      string[] parts = line.Split(SplitChhar);
      string htmlDownloadLinks = string.Empty;
      //Returns the index of the found 
      try {
        htmlDownloadLinks = parts[index];
      } catch (Exception Ex) {
        //MessageBox.Show($"There seems to be a problem: {Ex}", "I am an error", MessageBoxButton.OKCancel, MessageBoxImage.Error);
      }
      yield return htmlDownloadLinks;

    }
  }
  //MessageBox.Show($"There seems to be a problem: {Ex}", "I am an error", MessageBoxButton.OKCancel, MessageBoxImage.Error);
}

Thanks for the quick answer, But is there a way to return `results` without abandoning the Try Catch ? — , Aug 31 '17 at 10:56
It doesn't return the HTML links it returns something like `DownloadManager_v2._5_1.MainWindow+d__7` — , Aug 31 '17 at 11:07
but i did not change anything in your "search" logic. it does now only return every match it finds instead of only the first match — Tobias Theel, Aug 31 '17 at 11:09
Yes but I need it to return every HTML download link that is present in my HTML file — , Aug 31 '17 at 11:11
I wanted to use regex but people kep telling me I need HTML Agility pack or something like that so I quiet that idea — , Aug 31 '17 at 11:27

score 0 · Accepted Answer · answered Aug 31 '17 at 11:57

Try this:

private IEnumerable<string> GetHTMLDownloadLinks(string url, char SplitChhar, string serach, int index)
{
    //Initiates a new instance of WEbClient class
    List<string> links = new List<string>();
    WebClient WC = new WebClient();
    try
    {
        //Initiates a new stream instance with a url
        Stream stream = WC.OpenRead(url);
        //Initiates a streamreader to read the url parsed 
        StreamReader reader = new StreamReader(stream);

        string line;

        //Loops through the specifed url's html source 
        //and read every line
        while ((line = reader.ReadLine()) != null)
        {

            //If it finds the specified character that the user passed
            if (line.IndexOf(serach) != -1)
            {
                //it adds it to the parts variable 
                string[] parts = line.Split(SplitChhar);

                //Returns the index of the found 
                links.Add(parts[index]);
            }
        }
    }
    catch (Exception Ex)
    {
        MessageBox.Show($"There seems to be a problem: {Ex}", "I am an error", MessageBoxButton.OKCancel, MessageBoxImage.Error);
    }

    links.Add("" + "\n");
    return links;
}

var links = GetHTMLDownloadLinks(@"Link to the HTML file", '"', "download", 1);
foreach (var link in links)
    TxtBox_WebInfo.Text += link + Environment.NewLine;

Thank you very much, this is exactly what I was looking for I will try to improve it and somehow making so I wont have to type the loop inside the MainWindow function. — , Aug 31 '17 at 14:29
Of couse you will have to loop through the links one way or Another if you intend to display them all in a single TextBlock... — mm8, Aug 31 '17 at 14:32
Thanks again for the advice, Yeah it Totally slipped through my head. Thanks again! — , Aug 31 '17 at 14:48

HTML searcher only returns 1 link

2 Answers2