-2

I need to load a long string from the internet and I have done that. Now I need to find the H1 header tag and print the contents.

What is the shortest or the easiest way to do that?

for (int x = 0; x < tempString.Length; x++)
{

    if (write == 2)
    {
        name =name + tempString[x];
        lenght++;
    }
    if (tempString[x] == '<' && tempString[x] == 'h' && tempString[x] == '1' )
        write = 1;

    if (write == 1 && tempString[x] == '>')
        write = 2;

    if (tempString[x] == '-' && write == 1)
        write = 0;
}

I know it's a bit how shall I say odd. But it's all I have.

Mat
  • 202,337
  • 40
  • 393
  • 406
Shawn
  • 228
  • 1
  • 4
  • 19
  • thnx for all of your help but atm i can't tell who gave the best answer because i got a 404 error on the server conenction – Shawn Jul 06 '12 at 14:28

5 Answers5

6

Use the HTML Agility Pack - Pretty much anything else you try is just going to cause you a headache.

HtmlAgility sample:

var html = "<html><head></head><body><h1>hello</h1></body></html>";

HtmlDocument d = new HtmlDocument();
d.LoadHtml(html);

var h1Contents = d.DocumentNode.SelectSingleNode("//h1").InnerText;
DaveShaw
  • 52,123
  • 16
  • 112
  • 141
3

If you want to do it in flat C#, and you're only looking at 1 tag:

int first_tag = str.IndexOf("<H1>");
int last_tag = str.IndexOf("</H1>");
string text = str.SubString((first_tag + 4), (last_tag - first_tag));
Nathan White
  • 1,082
  • 7
  • 21
1

Use an HTML library!

Otherwise try:

String.IndexOf(String x )

http://msdn.microsoft.com/en-us/library/k8b1470s.aspx

you can use that to get the first index of the start and end tags. you can then just read between those indices.

Colin D
  • 5,641
  • 1
  • 23
  • 35
1

The System.String class has methods like IndexOf(String) - Reports the zero-based index of the first occurence of the specified string.

So in your case, you could pass in "<H1>". Then you could get a substring starting at that point, and then call this method again looking for "</H1>" again.

Or if you want, it might be easier to use Regular Expressions in .NET. Those are found in the System.Tet.RegularExpressions namespace. Those are definitely more complicated. But I'm sure you could practice using some small samples and learn the power of the dark side! (errr....) the power of regular expressions! :)

[edit] Now that I see other's answers, I definitely agree with others. If you need do anything more complicated than getting one item in an HTML formatted string use an html parser.

C.J.
  • 15,637
  • 9
  • 61
  • 77
0

all of the above work fine i just can't use any external libaries

this works well for me

for (int x = 0; x < tempString.Length; x++)
        {

            if (tempString[x] == '-' && write == 2)
            { write = 0; }

            if (write == 2)
            {
                title =title + tempString[x];
                lenght++; 
            }
            if (tempString[x] == '<' && tempString[x+1] == 'h' && tempString[x+2] == '1' )
            { write = 1; }

            if (write == 1 && tempString[x] == '>')
            { write = 2; }


        }
Shawn
  • 228
  • 1
  • 4
  • 19
  • indexof is not an external library. You have basically the same code as what you say has been rejected by your college three times. Have you asked your professor WHY it's been rejected 3x? – JohnP Jul 06 '12 at 14:57
  • yes pure stupidity first time there were no language tags witch i don't nead because it's all in elnglish. the second time i parsed the link title instead of the paragraph title the only difference is that sometimes there is an upacse insdead of a low one – Shawn Jul 06 '12 at 15:04