How can I find a word in a C# string

Question

I need to load a long string from the internet and I have done that. Now I need to find the H1 header tag and print the contents.

What is the shortest or the easiest way to do that?

for (int x = 0; x < tempString.Length; x++)
{

    if (write == 2)
    {
        name =name + tempString[x];
        lenght++;
    }
    if (tempString[x] == '<' && tempString[x] == 'h' && tempString[x] == '1' )
        write = 1;

    if (write == 1 && tempString[x] == '>')
        write = 2;

    if (tempString[x] == '-' && write == 1)
        write = 0;
}

I know it's a bit how shall I say odd. But it's all I have.

thnx for all of your help but atm i can't tell who gave the best answer because i got a 404 error on the server conenction — Shawn, Jul 06 '12 at 14:28

DaveShaw · Accepted Answer · 2017-11-26T21:44:48.950

6

Use the HTML Agility Pack - Pretty much anything else you try is just going to cause you a headache.

HtmlAgility sample:

var html = "<html><head></head><body><h1>hello</h1></body></html>";

HtmlDocument d = new HtmlDocument();
d.LoadHtml(html);

var h1Contents = d.DocumentNode.SelectSingleNode("//h1").InnerText;

edited Nov 26 '17 at 21:44

answered Jul 06 '12 at 14:17

DaveShaw

52,123
16
112
141

Well, not *anything* else, but most things people try will. – Jeremy Holovacs Jul 06 '12 at 14:18
@JeremyHolovacs Added the obligatory answer as to why Regex doesn't work ;) – DaveShaw Jul 06 '12 at 14:26

score 3 · Answer 2 · answered Jul 06 '12 at 14:20

3

If you want to do it in flat C#, and you're only looking at 1 tag:

int first_tag = str.IndexOf("<H1>");
int last_tag = str.IndexOf("</H1>");
string text = str.SubString((first_tag + 4), (last_tag - first_tag));

answered Jul 06 '12 at 14:20

Nathan White

1,082
7
21

1

you might need some offsets to your substring call. otherwise it will contain the
txt.
– Colin D Jul 06 '12 at 14:22
Thanks. Added + 4 (the length of the
tag)
– Nathan White Jul 06 '12 at 14:25
what happens when that heading tag has a class assigned to it? – Jeremy Holovacs Jul 06 '12 at 15:01
@JeremyHolovacs I completely understand, and for that instance you would require a few more checks, but I don't think that is what OP requires. If OP requires this information, I'm sure we can work it out :D – Nathan White Jul 06 '12 at 15:54

score 1 · Answer 3 · answered Jul 06 '12 at 14:18

1

Use an HTML library!

Otherwise try:

String.IndexOf(String x )

http://msdn.microsoft.com/en-us/library/k8b1470s.aspx

you can use that to get the first index of the start and end tags. you can then just read between those indices.

answered Jul 06 '12 at 14:18

Colin D

5,641
1
23
35

score 1 · Answer 4 · answered Jul 06 '12 at 14:22

The System.String class has methods like IndexOf(String) - Reports the zero-based index of the first occurence of the specified string.

So in your case, you could pass in "<H1>". Then you could get a substring starting at that point, and then call this method again looking for "</H1>" again.

Or if you want, it might be easier to use Regular Expressions in .NET. Those are found in the System.Tet.RegularExpressions namespace. Those are definitely more complicated. But I'm sure you could practice using some small samples and learn the power of the dark side! (errr....) the power of regular expressions! :)

[edit] Now that I see other's answers, I definitely agree with others. If you need do anything more complicated than getting one item in an HTML formatted string use an html parser.

score 0 · Answer 5 · answered Jul 06 '12 at 14:47

0

all of the above work fine i just can't use any external libaries

this works well for me

for (int x = 0; x < tempString.Length; x++)
        {

            if (tempString[x] == '-' && write == 2)
            { write = 0; }

            if (write == 2)
            {
                title =title + tempString[x];
                lenght++; 
            }
            if (tempString[x] == '<' && tempString[x+1] == 'h' && tempString[x+2] == '1' )
            { write = 1; }

            if (write == 1 && tempString[x] == '>')
            { write = 2; }


        }

answered Jul 06 '12 at 14:47

Shawn

228
1
4
19

indexof is not an external library. You have basically the same code as what you say has been rejected by your college three times. Have you asked your professor WHY it's been rejected 3x? – JohnP Jul 06 '12 at 14:57
yes pure stupidity first time there were no language tags witch i don't nead because it's all in elnglish. the second time i parsed the link title instead of the paragraph title the only difference is that sometimes there is an upacse insdead of a low one – Shawn Jul 06 '12 at 15:04

How can I find a word in a C# string

5 Answers5

txt.

tag)