3

Is it possible to use the HAP (HTML Agility Pack) to:

  1. Grab a collection of nodes e.g. all <a> elements which are children of <li> elements
  2. Iterate over the collection
  3. Add CSS class references into the class attribute for each element e.g. class &= "foo"
  4. Update the nodes in their original position within the HTML

For point 4, I need to know whether:

  • When I grab a collection of nodes, am I working with copies?
  • If so, can I easily update the nodes in their original position within the HTML

Finally, would it be practical to do this when rendering a page in an ASP.NET website, considering:

  • I will need to modify the class references for no more than 100 elements
  • I am not working with large HTML documents
  • I plan to select my nodes starting at a div e.g. div[2] where body contains 4 divs

I realise that this may seem like a bunch of separate questions but really it is just a breakdown of the following two questions:

  • Can I easily modify the HTML output of an ASP.NET page e.g. to insert class references?
  • Would it be practical to do this on 50 - 100 elements WRT speed e.g. no more than 2 seconds cost?

Many thanks.

Chris Cannon
  • 1,157
  • 5
  • 15
  • 36

2 Answers2

1

Don't do that ! ASP.NET is not meant to be used that way, there is a better ways to do this task depending on how do you create that markup in witch you want change or add css classes. ASP.NET uses aspx templates, basically html markup and there you can intervene with code executing on server, here you can set css class statically or use server side scripts to set css class on markup with some code.

You can also create controls in code behind and set css to controls if anchor control have parent that is list item control (you will have to use server side controls).

To do it your way you will have to make Response Filter (example here) and after request is done do your parsing and write results and changes back to response stream. It's much easier using common ASP.NET techniques.

Antonio Bakula
  • 20,445
  • 6
  • 75
  • 102
1

Check out my CsQuery project: https://github.com/jamietre/csquery or on nuget as "CsQuery".

This is a C# (.NET 4) port of jQuery. Selectors are orders of magnitude faster than HTML Agility Pack; in fact, my initial purpose in writing it was to do exactly what you want to do: manipulate HTML in real time. As it happens, from a CMS with html generated by ckeditor.

To intercept HTML in webforms with CsQuery you do this in the page codebehind:

using CsQuery;
using CsQuery.Web;

protected override void Render(HtmlTextWriter writer)
{

   // the CsQueryHttpContext object is part of the CsQuery library, it's a helper 
   // than abstracts the process of intercepting base.Render() for you.

    CsQueryHttpContext csqContext = 
        WebForms.CreateFromRender(Page, base.Render, writer);

    // CQ object is like a jQuery object. The "Dom" property of the context
    // returned above represents the output of this page.

    CQ doc = csqContext.Dom;

    doc["li > a"].AddClass("foo");

    // write it
    csqContext.Render();
}

There is basic documentation on GitHub, but apart from getting HTML in and out, it works pretty much like jQuery. The WebForms object above is just to help you handle interacting with the HtmlTextWriter object and the Render method. The general-purpose usage is very simple:

var doc = CQ.Create(htmlString);
// or 
var doc = CQ.CreateFromUrl(url);

.. do stuff with doc, a CQ object that acts like a jQuery object

string html = doc.Render();
Jamie Treworgy
  • 23,934
  • 8
  • 76
  • 119
  • Sounds cool, just a couple of questions (a) is it compatible with VB.NET? (b) is the WebForms class part of CQ? (c) what is the datatype of csqContext? (d) I am right in thinking that the first code sample is specific for overriding the Render method and the second code sample is just for when you're working with HTML data in a string (e) what are the advantages of code sample 1 over code sample 2 for my scenario WRT performance? – Chris Cannon Jun 13 '12 at 20:46
  • 1
    a) yes (same as any .NET library) b) yes c) it's a class, not a data type - just for handling the `Render` method d) yes e) sample 1 is what you would use with webforms Page. Sample 2 was just to show you how it works in a general context. The WebForms class will be doing the same thing anyway (it all starts from a string) but there's a bit of hassle to get from the `HtmlTextWriter` to a plain old string and back again; that class covers that. Take a look at the source of the `WebForms` object to see how it works, it's mostly converting between StringBuilders and TextWriters and so on. – Jamie Treworgy Jun 14 '12 at 02:53
  • 1
    .. performancewise you should not have anything to worry about. In your intial q. you said something like "no more than 2 seconds". For some basic performance testing I use a 6 megabyte file with 100,000 elements. On my laptop CsQuery can parse it under 2 seconds and do a 2-part selector `div > span` returning almost 2,000 spans about 15 milliseconds. Altering elements takes no time at all (you're just setting properties). So working with normal sized HTML files, the time to do a typical set of operations would be measured in milliseconds. But just try it out and see how it works for you. – Jamie Treworgy Jun 14 '12 at 02:57
  • thanks very much. I am very excited to try it out, I will do so this weekend. Please could I have some instructions for integrating it with a WebForms website? One final question: what are the advantages of CSQuery over HAP? – Chris Cannon Jun 14 '12 at 09:07
  • 1
    The code above is all you really need to integrate with webforms, if you have specific questions feel free to contact me directly then. There is more documentation in the readme on the github repo linked from my answer. I think the advantages are speed, and familiar syntax for selecting and manipulating HTML (CSS selectors and the jQuery API) – Jamie Treworgy Jun 14 '12 at 10:43
  • I mean installation, do I simple copy some DLLs to my bin folder and away I go? Or is there a better way than that? P.S. Please have a look at my other question which is essentially the same as this http://stackoverflow.com/q/10970737/1071203 - there is a bounty on it! – Chris Cannon Jun 14 '12 at 16:29
  • 1
    Oh, sure, just add a reference to `CsQuery.dll`. Easiest of all, is just use NuGet package manager: `install-package csquery` and it will be downloaded & added for you. – Jamie Treworgy Jun 14 '12 at 17:56
  • I posted an answer there, copped from this one but I added some more info you may find useful. FWIW- I have written all the framework infrastructure to allow you to write pure HTML pages and parse them directly with CsQuery without the webforms/aspx parser ever involved. Unfortunately it's too tightly coupled with that project right now to easily share, I'd just start by intercepting `Render` in a regular aspx page. If you find this works well for you and you want to take it to the next level I'd be happy to work with you on setting up a framework. – Jamie Treworgy Jun 14 '12 at 18:36
  • Please can I ask why you use var many times in the above examples? e.g. `var csqContext = WebForms.CreateFromRender(Page, base.Render, writer);` Am I right in thinking that csqContext is a variable? Is it not known what the datatype of csqContext is? – Chris Cannon Jun 15 '12 at 18:16
  • 1
    `var` is shorthand; see: http://msdn.microsoft.com/en-us/library/bb384061.aspx actually you can ONLY use it when the type of the object is known in advance. It just makes code more readable. `CreateFromRender` returns a special object that is used to manage interaction with the Render override, it is part of CsQuery and is of type `CsQueryHttpContext` – Jamie Treworgy Jun 15 '12 at 19:22
  • Ok NP. I think I will avoid implicit typing if possible. I thought that var is only supposed to be used in "special cases" e.g. LINQ. Anyhow, I will be giving CSQuery a whirl tomorrow. Thanks very much for your help. – Chris Cannon Jun 15 '12 at 21:03
  • 1
    It's just language sugar, like automatic properties. A lot of people find it improves readability and it makes life easier since you don't need to remember & type out the type for every object you use. In this case I probably should have typed it out since it's an example using something unfamiliar. I have updated it. But you know, it's all implicit typing these days... that said nothing wrong with long form either, but I find that the less I have to type, the fewer mistakes I make :) – Jamie Treworgy Jun 15 '12 at 21:57
  • Just tried it and I am very happy with it, altough `nth-child(2)` doesn't work for me though so I am using `eq(1)` thanks again this will really reduce the amount of time I spend on CSS! – Chris Cannon Jun 17 '12 at 20:33
  • Glad to hear! Can you show me code where nth-child isn't working? e.g. html & selector. Either email me directly or post an issue on github. My email is in my profile. I am very concerned with ensuring that all css selectors work right! – Jamie Treworgy Jun 18 '12 at 12:36
  • I was trying it out and using a selector such as "div#Main > p:nth-child(2)" I tried the nth-child selector several times and I couldn't get it to work, but it's ok as I'm using eq(2) and that works fine. – Chris Cannon Jun 19 '12 at 21:46
  • Would you mind sharing an example of the HTML against which it did not work? I'm glad you found a workaround but I really do want to fix any problems that exist with CSS selectors. That definitely should work! – Jamie Treworgy Jun 20 '12 at 07:07
  • The nth-child selector now works as of v1.1.1.22414 thanks but I got another issue - I've posted it on GitHub as I don't want to annoy the moderators here with this thread getting so large :) – Chris Cannon Jun 21 '12 at 19:28