Questions tagged [html-agility-pack]

HTML Agility Pack is an open-source HTML parser that builds a read/write DOM and supports Linq, plain XPATH or XSLT.

HTML Agility Pack is an open-source HTML parser that builds a read-and-write DOM and supports Linq, plain XPath or XSLT.

It is a .NET code library that allows parsing out of the web HTML files. The parser is very tolerant to malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents or streams.

Installing HTML Agility Pack can most easily be done using its NuGet package:

Install-Package HtmlAgilityPack

Latest stable release: 1.11.3 / 18 April 2019

GitHub page: https://github.com/zzzprojects/html-agility-pack

3466 questions
653
votes
7 answers

How to use HTML Agility pack

How do I use the HTML Agility Pack? My XHTML document is not completely valid. That's why I wanted to use it. How do I use it in my project? My project is in C#.
chh
87
votes
2 answers

HtmlAgilityPack: Get whole HTML document as string

Does HtmlAgilityPack have the ability to return the whole HTML markup from an HtmlDocument object as a string?
deostroll
  • 11,661
  • 21
  • 90
  • 161
76
votes
6 answers

Html Agility Pack get all elements by class

I am taking a stab at html agility pack and having trouble finding the right way to go about this. For example: var findclasses = _doc.DocumentNode.Descendants("div").Where(d => d.Attributes.Contains("class")); However, obviously you can add…
Adam
  • 3,615
  • 6
  • 32
  • 51
66
votes
5 answers

How to get html elements with multiple css classes

I know how to get a list of DIVs of the same css class e.g
1
2
using xpath //div[@class='class1'] But how if a div have multiple classes, e.g
1
What will the…
seasong
  • 763
  • 1
  • 6
  • 7
63
votes
5 answers

HTML Agility pack - parsing tables

I want to use the HTML agility pack to parse tables from complex web pages, but I am somehow lost in the object model. I looked at the link example, but did not find any table data this way. Can I use XPath to get the tables? I am basically lost…
weismat
  • 7,195
  • 3
  • 43
  • 58
60
votes
3 answers

HtmlAgilityPack and HtmlDecode

I am currently using HtmlAgilityPack with a console application to scrape a website. Since the html is encoded (it returns encoded characters like ') I have to decode before I save the content to my database. Is there a way to decode the…
Thomas
  • 5,888
  • 7
  • 44
  • 83
53
votes
5 answers

HtmlAgilityPack and selecting Nodes and Subnodes

Hope somebody can help me. Let´s say I have a html document that contains multiple divs like this example:
Richard Winchester Kodak
The Jack
  • 553
  • 1
  • 4
  • 6
52
votes
4 answers

xpath search for divs where the id contains specific text

On my HTML page I have forty divs but I only want one div. Using agility pack to search and get all the divs with Ids I use this XPath: "//div[@id]" But how do I search for divs with Ids where the id contains the text "test" like so:
Hello-World
  • 9,277
  • 23
  • 88
  • 154
51
votes
5 answers

HTML agility pack - removing unwanted tags without removing content?

I've seen a few related questions out here, but they don’t exactly talk about the same problem I am facing. I want to use the HTML Agility Pack to remove unwanted tags from my HTML without losing the content within the tags. So for instance, in my…
Mathias Lykkegaard Lorenzen
  • 15,031
  • 23
  • 100
  • 187
43
votes
2 answers

XPath wildcard in attribute value

I have the following XPath to match attributes of the class span: //span[@class='amount'] I want to match all elements that have the class attribute of "amount" but also may have other classes as well. I thought I could do…
Colin Brown
  • 589
  • 1
  • 8
  • 15
42
votes
10 answers

Convert (render) HTML to Text with correct line-breaks

I need to convert HTML string to plain text (preferably using HTML Agility pack). With proper white-spaces and, especially, proper line-breaks. And by "proper line-breaks" I mean that this code:
line1 …
Alex from Jitbit
  • 53,710
  • 19
  • 160
  • 149
41
votes
3 answers

HtmlAgilityPack selecting childNodes not as expected

I am attempting to use the HtmlAgilityPack library to parse some links in a page, but I am not seeing the results I would expect from the methods. In the following I have a HtmlNodeCollection of links. For each link I want to check if there is an…
Sheff
  • 3,474
  • 3
  • 33
  • 35
40
votes
2 answers

HtmlAgilityPack - How to get the tag by Id?

I have a task to do. I need to retrieve the a tag or href of a specific id (the id is based from the user input). Example I have a html like this
knowme
  • 407
  • 1
  • 4
  • 10
39
votes
8 answers

Grab all text from html with Html Agility Pack

Input

foo bar baz

Output foo bar baz I know of htmldoc.DocumentNode.InnerText, but it will give foobarbaz - I want to get each text, not all at a time.
Surajit
  • 527
  • 3
  • 8
  • 16
35
votes
2 answers

Parsing HTML page with HtmlAgilityPack

Using C# I would like to know how to get the Textbox value (i.e: john) from this sample html script :
Name :
Hassen
  • 860
  • 2
  • 11
  • 23
1
2 3
99 100