0

Is there a way to get a page to parse through its self?

So far I have:

string whatever = TwitterSpot.InnerHtml;

HtmlDocument doc = new HtmlDocument();

doc.LoadHtml(whatever);

foreach("this is where I am stuck")
{

}

I want to parse the page so what I did is create a parent div named TwitterSpot. Put the InnerHtml into a string, and have loaded it as a new HtmlDocument.

Next I want to get within that a string value of "#XXXX+n " and replace it in the page infront with some cool formatting.

I am getting stuck on my foreach loop do not know how I should search for a # or how to look through the loaded HtmlDocument.

The next step is to apply change to where ever I have seen a # tag. I could do this is JavaScript probably a lot easier I know but I am adament on seeing how I can get asp.net c# to do it.

The # is a string value within the html I am not referring to it as a Control ID.

Anicho
  • 2,647
  • 11
  • 48
  • 76
  • 1
    It sounds like you're reinventing the wheel here...why not just use server controls and plug in your text with Page.FindControl? – Tim Feb 02 '12 at 20:02
  • @Tim: If there is a better way please share with me how to do it or send me in the right direction. I will accept alternative asp.net c# solutions. – Anicho Feb 02 '12 at 20:03
  • @Tim: Just to clarify I am not trying to pick up a `Control ID` rather just plain text – Anicho Feb 02 '12 at 20:23

5 Answers5

3

Assuming you're using HtmlAgilityPack, you could use xpath to find text nodes which contain your value:

var matchedNodes = document.DocumentNode
              .SelectNodes("//text()[contains(.,'#XXXX+n ')]");

Then you could just interate through these nodes and make all the necessary replacemens:

foreach (HtmlTextNode node in matchedNodes)
{
    node.Text = node.Text.Replace("#XXXX+n ", "brand new text");
}
Oleks
  • 31,955
  • 11
  • 77
  • 132
1

I guess you could use RegEx to find all matches and loop through them.

Sebastian Siek
  • 2,045
  • 17
  • 16
  • Obligatory: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – NotMe Feb 02 '12 at 20:40
  • Chris, the requirement is to replace string within the string (which is in HTML format). You do not need to parse html elements as you aren't interested in attributes etc. – Sebastian Siek Feb 02 '12 at 20:55
  • if you want to parse html then use IHTMLDocument from Microsoft.mshtml, or Argotic Syndication Framework http://argotic.codeplex.com/ – Sebastian Siek Feb 02 '12 at 21:07
  • I am pretty sure if you can parse through any form of text then you should parse through it rather then loop through it. My experience tells me its industry best practice. – Anicho Feb 03 '12 at 09:35
1

You can use http://htmlagilitypack.codeplex.com/ to parse HTML and manipulate its content; works very well.

  • If you look at the tags it has HTMLAgilityPack tagged already and `html document` comes from the pack. – Anicho Feb 03 '12 at 09:25
1

You could just change it to be:

string whatever = TwitterSpot.InnerHtml;

whatever = whatever.Replace("#XXXX+n ", String.format("<b>{0}</b>", "#XXXX+n "));

No parsing required...

NotMe
  • 87,343
  • 27
  • 171
  • 245
  • This does settle the requirements from above, any idea how I can replace all the html in the page with whats in string whatever? – Anicho Feb 03 '12 at 09:33
0

When I did this before, I stored the HTML in an XML doc and looped through each node. You can then apply XSLT or just parse the nodes.

It sounds like for your purposes though that you don't really need to do that. I'd recommend making the divs into server controls and programmatically looping through their child controls, as such:

foreach (Object o in divSomething.Controls)
{
    if (o.GetType == "TextBox" && ((TextBox)o).ID == "txtSomething")
    {
        ((TextBox)o).Attributes.Add("style", "font: Arial; color: Red;");
    }
}
  • Yeah except its more I want to get "#myHash" which is a string value and not an id for the control. – Anicho Feb 02 '12 at 20:18
  • Well, how are you storing the #myHash? I just put ID and TextBox there as an example, you can access any attributes you like. So replace ID with Text or Value if you're using an HTML control. – Justin Moore Feb 02 '12 at 20:24