-1

I want to extract the release date of the film from this link.

The problem is that it is given directly in a <td> tag, which has no class or id. The only possible solution that I can think of is using the style tag to extract data but I have no idea on how to do it.

Here's my code

url = "https://en.wikipedia.org/wiki/" + textBox1.Text.Replace(" ", "_");
try
{
    foreach (HtmlNode node in doc.DocumentNode.SelectNodes(/*?*/))
    {
        label1.Text+=node.InnerText;
    }                                
}
catch (Exception ex3) { }

Please help!

Manfred Radlwimmer
  • 13,257
  • 13
  • 53
  • 62
Kabeer
  • 46
  • 9
  • 1
    Why don't you just use the [API](https://en.wikipedia.org/w/api.php)? Or since you want to get info about a movie the [API of some movie db](https://developer.fandango.com/Rotten_Tomatoes)? Honestly, downloading a wiki-page and manually parsing it would be the **last** thing I'd do. – Manfred Radlwimmer Aug 17 '17 at 13:28
  • @Manfred Radlwimmer its sorta project and im only allowed to use html-agility-pack – Kabeer Aug 17 '17 at 13:31
  • If by that you mean it's some sort of school assignment, then whoever is teaching you is leading you down a very wrong path. – Manfred Radlwimmer Aug 17 '17 at 13:33
  • Its not a school project @Manfred Radlwimmer – Kabeer Aug 17 '17 at 13:36
  • Then who's stopping you from doing this *the right way*? The html-agility-pack has it's uses and familiarity with it doesn't hurt but it should be a last resort. When a site offers APIs, WebServices, RSS or anything similar - use that instead. – Manfred Radlwimmer Aug 17 '17 at 13:38

1 Answers1

0

The following XPath expression gives you the element you need:

//*[@id="mw-content-text"]/div/table[1]/tbody/tr[14]/td

Pro tip: Open Chrome debugger tools, navigate to the element you are searching for, right click and hit "Copy > Copy xpath".

Suggestion: The XPath expression seems rather brittle. Sometimes it makes more sense trying to extract specific parts of the HTML with RegEx, which might lead to a more stable solution. However, don't try to parse HTML with Regex!

larsbe
  • 397
  • 3
  • 10
  • 2
    table[1] and tr[14] are using index. On a different wiki page, this will not work. I think it is better to retrieve the whole table and check for the correct th element with the text 'Release Date' – Sebastian Siemens Aug 17 '17 at 13:20
  • True! As I said, at this point it might make sense to use RegEx or just iterate over the table rows. – larsbe Aug 17 '17 at 13:22