I'm trying get the content of using HTML agility pack. Here's a sample of the HTML i'm trying to parse :
<p itemprop="articleBody">
Hundreds of thousands of Ukrainians filled the streets of Kiev on Sunday, first to hear speeches and music and then to fan out and erect barricades in the district where government institutions have their headquarters.</p><p itemprop="articleBody">
Carrying blue-and-yellow Ukrainian and European Union flags, the teeming crowd filled
Independence Square, where protests have steadily gained momentum since Mr. Yanukovich refused on Nov. 21 to sign trade and political agreements with the European Union. The square has been transformed by a vast and growing tent encampment, and demonstrators have occupied City Hall and other public buildings nearby. Thousands more people gathered in other cities across the country. </p><p itemprop="articleBody">
“Resignation! Resignation!” people in the Kiev crowd chanted on Sunday, demanding that Mr. Yanukovich and the government led by Prime Minister Mykola Azarov leave office. </p>
I'm trying to parse the HTML above using the folllowing code :
HtmlAgilityPack.HtmlWeb nytArticlePage = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument nytArticleDoc = new HtmlAgilityPack.HtmlDocument();
System.Diagnostics.Debug.WriteLine(articleUrl);
nytArticleDoc = nytArticlePage.Load(articleUrl);
var articleBodyScope =
nytArticleDoc.DocumentNode.SelectNodes("//p[@itemprop='articleBody']");
EDIT:
But it seems like articleBodyScope is empty,because:
if (articleBodyScope != null)
{
System.Diagnostics.Debug.WriteLine("CONTENT NOT NULL");
foreach (var node in articleBodyScope)
{
articleBodyText += node.InnerText;
}
}
Does not print "CONTENT NOT NULL" and articleBodyText
remains empty.
If anyone could point me to the solution i'd be grateful, thanks in advance !