0

I'm trying to get the Url of an image, at the moment I have this code which does work but needs a webBrowser to do so.

    public void getFileUrl(HtmlDocument htmlDocument)
    {
        HtmlElementCollection htmlCollectionImage = htmlDocument.Images;
        foreach (HtmlElement htmlImage in htmlCollectionImage)
        {
            string Url = htmlImage.GetAttribute("src");
            if (Url.StartsWith("http://www.exemple.com/"))
            {
                MessageBox.Show(Url);
            }
        }
    }

I need to peace something up which doesn't require the webBrowser, but I really don't know how to do that.

Also instead of an HtmlDocument htmlDocument being fed to the method, I need to feed it a simple string.

Any alternative?

BlackTigerX
  • 6,006
  • 7
  • 38
  • 48
  • 1
    Possible duplicate of [Get HTML code from a website C#](http://stackoverflow.com/questions/16642196/get-html-code-from-a-website-c-sharp) – Sievajet Dec 02 '15 at 22:56

1 Answers1

0

Try something like this:

static void Main()
{
    var fileUrls = GetFileUrl(@"https://stackoverflow.com/questions/34054662/get-a-file-url-without-webbrowser-c-sharp", @"https://www.gravatar.com/");

    foreach (string url in fileUrls)
    {
        Console.WriteLine(url);
    }

    Console.ReadKey();
}

public static IEnumerable<string> GetFileUrls(string url)
{
    var document = new HtmlWeb().Load(url);
    var urls = document.DocumentNode.Descendants("img")
                                    .Select(e => e.GetAttributeValue("src", null))
                                    .Where(s => s.ToLower().StartsWith(pattern));

    return urls;
}

Adapted from: How can I use HTML Agility Pack to retrieve all the images from a website?

Edited to include usage and add a pattern parameter to GetFileUrls().

Community
  • 1
  • 1
tom982
  • 142
  • 1
  • 11
  • I've tried what you suggested but I when I try to check the returned `urls.ToString()`, I get this : `System.Linq.Enumerable+WhereEnumerableIterator 1[System.String]` I tried to use the Url and still didn't work. I don't know if I did something wrong – Peter Quill Dec 03 '15 at 18:49
  • I've edited my answer to show how to use it. As it's returning a collection of the images (`IEnumerable`), you can't just cast it to a string. You need to iterate through it and then you can work with each url. I also edited the method to add a second parameter, pattern, so you can specify what you want it to start with when you call the method - be careful with http/https though as it could give you some trouble, perhaps consider changing it to string.Contains() and omitting the protocol, or even RegEx. – tom982 Dec 03 '15 at 19:11
  • Thanks got it to work, my problem was I was trying to check the `urls` from inside the `GetFileUrls()` instead of with `url` as you did. – Peter Quill Dec 04 '15 at 18:22