4

I know how to read a line in a txt file but for some reason C# is not detecting the end of line on HTML files. This code basically opens the html file and tries to parse line by line in search of the specified string. Even when just trying to print the first line of text in the HTML file nothign is displayed.

using (StreamReader sr = new StreamReader("\\\\server\\myFile.html"))
        {
            String line;
            while ((line = sr.ReadLine()) != null)
            {
                if(line == ("<td><strong>String I wantstrong></td>"))
                {
                    Label1.Text = "Text Found";
                    break;
                }
            }
        }

I have tried this using a plain txt file and it works perfectly, just not when trying to parse an HTML file.

Thanks.

Rupert
  • 75
  • 1
  • 3
  • 7
  • The ending `strong` is the ending tag. – George Stocker Jan 14 '11 at 01:21
  • Sorry I messed up when copying and pasting, the '<' is there on my code. Also the '(' and ')' parenthesis are not on my code. – Rupert Jan 14 '11 at 01:21
  • Is there anything in the file? Does the user running the application have permission to use that network resource? Does this code work if you copy the file locally? If you break in the loop, is the breakpoint hit? It seems to me that the debug work that needs to be done here is fairly straightforward... – cdhowie Jan 14 '11 at 01:22
  • You'll get an error if you try to read a file that you don't have permissions for (or otherwise doesn't exist). But whether or not it has content in it... well ;) – Alan Jan 14 '11 at 01:38

4 Answers4

4

The best way by far is the use the HTML Agility Pack

More about this can be found on a previous Stack overflow Question

Looking for C# HTML parser

Community
  • 1
  • 1
Gaven
  • 371
  • 1
  • 6
3

You don't need to invent the wheel. Much better way to parse HTML is to use HTML parsers:

http://htmlagilitypack.codeplex.com/ or http://www.justagile.com/linq-to-html.aspx

Also similar question is here What is the best way to parse html in C#?

Hope it helps.

Community
  • 1
  • 1
angularrocks.com
  • 26,767
  • 13
  • 87
  • 104
0

If you know this HTML you are parsing is of XHTML why not parse this HTML as XML using System.XML ?

KK99
  • 1,971
  • 7
  • 31
  • 64
0

Your outer loop that reads line works fine. My guess is one of the following is taken place:

  • The HTML file is empty
  • The first line in the HTML file is empty

In either case, you won't see anything printed.

Now, to your loop:

You likely don't see what you expect, because

 if(line == ("<td><strong>String I wantstrong></td>"))
 {
    Label1.Text = "Text Found";
    break;
 }

Looks for an EXACT match. If this is your actual code, you're missing the open bracket </ on </strong> and you're likely forgetting that there is white space (indentation) in your HTML content.

Alan
  • 45,915
  • 17
  • 113
  • 134