30

I'm trying to get all the divs that their class contains a certain word:

<div class="hello mike">content1</div>
<div class="hello jeff>content2</div>
<div class="john">content3</div>

I need to get all the divs that their class contains the word "hello". Something like this:

resultContent.DocumentNode.SelectNodes("//div[@class='hello']"))

how can i do it with agility pack?

Ofer Gozlan
  • 953
  • 2
  • 9
  • 21

5 Answers5

41

I got it:

resultContent.DocumentNode.SelectNodes("//div[contains(@class, 'hello')]")
BurnsBA
  • 4,347
  • 27
  • 39
Ofer Gozlan
  • 953
  • 2
  • 9
  • 21
23

As of version v1.6.5 of Html Agility Pack, it contains .HasClass("class-name") extension method.

IEnumerable<HtmlNode> nodes =
    htmlDoc.DocumentNode.Descendants(0)
        .Where(n => n.HasClass("class-name"));
d219
  • 2,707
  • 5
  • 31
  • 36
Tohid
  • 6,175
  • 7
  • 51
  • 80
  • 4
    Above is 5 times faster than the most popular answer - though i used document.DocumentNode.Descendants().Where(x => x.HasClass(.... – Jaycee Jun 28 '19 at 15:12
12

I'm sure because there're multiple classes in your div, that doesn't work. You can try this instead:

resultContent.DocumentNode.Descendants("div").Where(d => d.Attributes["class"].Value.Contains("hello"));
Bikee
  • 1,197
  • 8
  • 21
  • 6
    Has one drawback as opposed to the other answer: it throws an exception if there's a div without `class`. Use this instead: `.Where(d => d.GetAttributeValue("class", "").Contains("hello"));` – Tim Schmelter Apr 19 '16 at 08:01
1

as you have specified that the class has to contain a certain word, the following will ensure that the word is:

  • at the start of the string and followed by a space
  • or in the middle of the string and surrounded by whitespace
  • or at the end of the string and preceded by a space
  • or the only class name in the class attribute

It does so by comparing the value of the class attribute surrounded by spaces with the specified word (hello) surrounded by spaces. This is to avoid false positives like class="something-hello-something"

resultContent.DocumentNode.SelectNodes("//div[contains(concat(' ', @class, ' '), ' hello ')]");
Keith Hall
  • 15,362
  • 3
  • 53
  • 71
0
HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.Load(filePath);
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//div[@class='hello']")
 {
    //code
 }
Divyesh
  • 438
  • 1
  • 4
  • 13
  • 3
    Doesn' work. OP tries to find alll divs where the class _contains_ a word as substring, for example `hello`. You are selecting only divs where the class **is** `hello` – Tim Schmelter Apr 19 '16 at 08:03