7

I am totally new to XPath concept and I have a very basic understanding of XPath. I started using XPath firstly for finding web elements on HTML page.

Now while searching over web (videos and text), I found that all XPath tutorials are related to XML (and not HTML pages).

Wiki says,

XPath (XML Path Language) is a query language for selecting nodes from an XML document.

This has confounded me a lot.

  1. Is XPath not used for HTML Document?
  2. Are there any fundamental/structural differences in writing XPath for HTML, XML, XHTML?

Please note that I understand that this question is below par, but only out of utter confuson I am asking it here.

kjhughes
  • 106,133
  • 27
  • 181
  • 240

1 Answers1

4

You have a right to be confused.

XPath operates against a data model that generally assumes that markup is well-formed. By definition, XML and XHTML are necessarily well-formed; HTML, not necessarily. However, HTML parsers can often successfully parse non-well-formed markup anyway, in the spirit of being liberal in what one accepts as input, into a data model suitable for XPath.

Therefore, you can usually also use XPath with HTML. Using XPath in this manner, in fact, is a common web page scraping technique.

Community
  • 1
  • 1
kjhughes
  • 106,133
  • 27
  • 181
  • 240