3

I've been working on cleaning up a very messy ASP.NET project, and I have a tool that measures project complexity in various ways, so I can show the results of my work: as I clean up, the complexity goes down.

One of my metrics was HTML markup line count, but I've realized that this isn't a very good way to measure, because line count is subject to inflation during formatting; this snippet:

<span><em>This is bold</em></span>

should have the same score as the pretty printed version:

<span>
  <em>This is bold</em>
</span>

But simply counting lines shows the second snippet having more lines.

What would be a better way to compute the complexity of markup, to capture the structural complexity, not just line count?

Update: Commenters asked about what I mean by complexity. I mean this in the sense of how much structure the page has. My original example wasn't the best one because the two snippets are the same. My ultimate goal is to convert sloppy table driven layouts to CSS, and I want to measure how much "less" code there is when that's done. Simply counting the number of nodes doesn't quite get at the nesting structure. Is there a metric that would capture the node count AND the nesting depth?

Joshua Frank
  • 13,120
  • 11
  • 46
  • 95
  • I'm not entirely sure that *"complexity"* of markup is what your measuring here, sounds more likey *"depth"*. There are many tools that would let you "format" the document first which translates example A to example B, that way your line count would give you what you want. – James Aug 15 '14 at 11:50
  • I think the term "Messy" is a little too vague to make an accurate description of what you are trying to measure. For, example I consider "Messy" html to be html which does not properly close tags, uses deprecated attributes, or does not abstract it's style information into a separate CSS file. – alstonp Aug 15 '14 at 12:01
  • You both make good points. I've updated my question to address this a little more. – Joshua Frank Aug 15 '14 at 12:44
  • Complexity of the page is not really a good measure of achievement, in either direction: Pages of the exact same complexity can be both awesome and horrible. Remember, the saying goes: "As simple as possible, but no more!" – Deduplicator Aug 15 '14 at 21:45

1 Answers1

1

You can use the agility pack to convert your html code to a list of nodes, actually to DOM, and then read the number of nodes.

This is a good measurement of the complex of an html page. Less nodes, less complex the html is, and this have as results to find faster any given element, when you search it with javascript.

This is also reference on Best Practices for Speeding Up Your Web Site by Yahoo

Other links:
How to use HTML Agility pack
How to get the count of tables in an html file with C# and html-agility-pack
Count specific child nodes with HtmlAgilityPack

Community
  • 1
  • 1
Aristos
  • 66,005
  • 16
  • 114
  • 150