Count all nodes in a HTML file

Question

Is there an easy way to count the nodes in a HTML file? I also need to count nodes of a certain type such as div etc.

I'd like to do this if possible without having to use an external library like HTMLAgilityPack if possible. Also, the HTML I'm dealing with is not guarenteed to be well formed and valid.

Is there a way to do this from C#?

Thanks.

Node here means parent level nodes like , or everything i.e even the nested ones like — Pawan Mishra, Nov 06 '11 at 18:08

score 0 · Answer 1 · answered Nov 06 '11 at 18:13

If you have XHTML you can load it in a XDocument and use XML manipulation API or LINQ to XML to count the particular modes.
If you don't you can try using Regular Expressions. But this one works in small number of interesting tags since you have to define manually an expression for each tag.

score 0 · Answer 2 · edited May 23 '17 at 12:11

0

With LinqToXml API, you can easily parse and loop through all the nodes of an HTML document. You can find helpful articles related to LinqToXml but all in context of parsing XML documents.

Following is a similar thread from StackOverflow : C# Is there a LINQ to HTML, or some other good .Net HTML manipulation API?

edited May 23 '17 at 12:11

Community

1
1

answered Nov 06 '11 at 18:16

Pawan Mishra

7,212
5
29
39

score 0 · Accepted Answer · edited May 23 '17 at 12:29

first of all. are your sure a client-side solution using javascript isn't sufficent to your needs? because the easiest way to count nodes within an HTML document is using jQuery on the client-side browser.

<script src="http://code.jquery.com/jquery-1.7.min.js"></script>
<script>
    $('html').children() // will give you all child elements of the html element
    $('body').children() // same for body element
    $('body').children('div') // will give you just the direct children elements of 'div' type
    $('body').find('div') // will give you all the nested elements of 'div' type
</script>

if you are unfamilier with jQuery then take a look at www.jquery.com

if u still need a C# solution for server-side parsing of the document then then i would recommend to use HTMLAgilityPack (even thou you wish not to). writing your own parser seems to me like a waste of time as you need to consider malformed html/xml and such which can be a pain.

try and use this s-overflow article: What is the best way to parse html in C#?

hope it will satisfy your needs

It's because I need to do it from C# - sorry if that wasn't clear, I'll update my question... — Jimmy Collins, Nov 06 '11 at 18:18
These files aren't going near a browser - it's a file diff on some localized files vs. the source files. — Jimmy Collins, Nov 06 '11 at 18:24

Count all nodes in a HTML file

3 Answers3