2

Hi I have Resume in the html format, I am reading file using StreamReader ,and I am removing tags using below method.

using (StreamReader sr = new StreamReader("\\Myfile.html"))
                {
                    String line = sr.ReadToEnd();
                    string jj = Regex.Replace(line, "<.*?>", String.Empty);
    }

Its working Damn Cool

But however as per my requirement I need the data only inside the body tag. but no body tag, and with no tags inside.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
New Bee
  • 43
  • 8

1 Answers1

0

Don't use Regex for HTML/XML parsing. Use Html/Xml parser. Here is explain well why you should not use it.

RegEx match open tags except XHTML self-contained tags

Can you provide some examples of why it is hard to parse XML and HTML with a regex?

You can load the string in Html document using HTML Agility pack

Here little example of how to do it:

public string ReplacePElement() 
{
    HtmlDocument doc = new HtmlDocument();
    doc.Load(htmlFile);

    foreach(HtmlNode p in doc.DocumentNode.SelectNodes("body"))
    {

    }

    return doc.DocumentNode.OuterHtml;
} 
Community
  • 1
  • 1
mybirthname
  • 17,949
  • 3
  • 31
  • 55