2

I want to extract Title, Description & images from URL using HTML Agility utility so far i am not able to find an example which is easy to understand & can help me to do it.

I would appreciate if some can help me with example so that i can extract title, description & give user choice to select image from series of image (some thing similar to Facebook when we share a link).

Updated:

I have place a label for title, desc and a button , textbox on the .aspx page & i fire following code on button click event. but it return null for all values. may be i am doing something wrong.

i used following sample URLhttp://edition.cnn.com/2012/10/31/world/asia/india/index.html?hpt=hp_t2

protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
    HtmlDocument doc = new HtmlDocument();
    var response = txtURL.Text;
    doc.LoadHtml(response);

    String title = (from x in doc.DocumentNode.Descendants()
                    where x.Name.ToLower() == "title"
                    select x.InnerText).FirstOrDefault();

    String desc = (from x in doc.DocumentNode.Descendants()
                   where x.Name.ToLower() == "description"
                   select x.InnerText).FirstOrDefault();

    List<String> imgs = (from x in doc.DocumentNode.Descendants()
                         where x.Name.ToLower() == "img"
                         select x.Attributes["src"].Value).ToList<String>();

    lblTitle.Text = title;
    lblDescription.Text = desc;
}

Above code gets me null value for all variable

if i modify the code with this

HtmlDocument doc = new HtmlDocument();
        var url = txtURL.Text;

        var webGet = new HtmlWeb();
         doc = webGet.Load(url);

in this case it only get me value for title & description is null again

Learning
  • 19,469
  • 39
  • 180
  • 373
  • I have to need to design a page where admin will add URL & then code should show them title, description & image on the page (URL). I cant give you HTML as it is based on URL you can take any URL for example of a news or article on any website which also has images – Learning Oct 31 '12 at 12:40
  • this may help http://stackoverflow.com/a/12239204/932418 – L.B Oct 31 '12 at 12:47

1 Answers1

3
protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(new Uri(txtURL.Text));
    request.Method = WebRequestMethods.Http.Get;

    HttpWebResponse response = (HttpWebResponse)request.GetResponse();

    StreamReader reader = new StreamReader(response.GetResponseStream());

    String responseString = reader.ReadToEnd();

    response.Close();

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(responseString);

    String title = (from x in doc.DocumentNode.Descendants()
                where x.Name.ToLower() == "title"
                select x.InnerText).FirstOrDefault();

    String desc = (from x in doc.DocumentNode.Descendants()
               where x.Name.ToLower() == "meta"
               && x.Attributes["name"] != null
               && x.Attributes["name"].Value.ToLower() == "description"
               select x.Attributes["content"].Value).FirstOrDefault();

    List<String> imgs = (from x in doc.DocumentNode.Descendants()
                     where x.Name.ToLower() == "img"
                     select x.Attributes["src"].Value).ToList<String>();

   lblTitle.Text = title;
   lblDescription.Text = desc;

}

Danilo Vulović
  • 2,983
  • 20
  • 31
  • Hi Danilo, Thanks, i have update my question with your solution but i am not able to make it work as values for title, desc .. are null for any URL. – Learning Nov 01 '12 at 04:41
  • You need to make HttpWebRequest in order to get response for desired URL. I changed my answer – Danilo Vulović Nov 01 '12 at 08:56
  • @Danila, I was able to make it work but the problem is it does return null for `desc` i tried event with your new code same result. – Learning Nov 01 '12 at 09:37
  • I use following code to get met description but problem with this is it is case sensitive . `Description` is different than `description` ` HtmlNode node = doc.DocumentNode.SelectSingleNode("//meta[@name='description']"); // descriptioncase sensitive var desc2 = ""; if (node != null) { desc2 = node.GetAttributeValue("content", ""); // TODO: write desc somewhere }` – Learning Nov 01 '12 at 09:39
  • Are you sure that response contains Description tag ? As you can see from code, there is ToLower() method call, that is why description is in lower case – Danilo Vulović Nov 01 '12 at 09:43
  • Can you post example of HTML response? – Danilo Vulović Nov 01 '12 at 09:51
  • example url `http://gulfnews.com/business/property/international/no-easy-homes-for-uk-homeowners-after-next-year-1.1096839` URL will vary – Learning Nov 01 '12 at 09:54
  • You do not have tag in your response. Instead, you have . I changed code – Danilo Vulović Nov 01 '12 at 09:57
  • Great It works, Thanks for your help.. it is good to learn from Gurus like you. – Learning Nov 01 '12 at 10:02