1

Simple:

I get this as MessageBody xItem.Body:

"<html>\r\n<head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\r\n</head>\r\n<body>\r\nDies ist test nummer 3\r\n</body>\r\n</html>\r\n"

And I only need to save the content between <body>\r\n \r\n</body> like:

m_Description = xItem.Body;

what's the easiest! way ?

Luca
  • 1,766
  • 3
  • 27
  • 38
  • 13
    Have you ever try to use [Html Agility Pack](http://htmlagilitypack.codeplex.com/)? It is the most known HTML parser. – Soner Gönül Dec 16 '14 at 09:04
  • 3
    Definitely use an external parser, as mentioned above. It's generally [considered bad practice](http://stackoverflow.com/a/1732454/791010) to use regex for parsing html – James Thorpe Dec 16 '14 at 09:06
  • will try it, thanks, but I'm looking for something without extensions or is there no other way? – Luca Dec 16 '14 at 09:06
  • If it wasn't for that `` tag I'd say you could probably use `System.Xml.XmlDocument` to parse it. – Phylogenesis Dec 16 '14 at 09:08
  • 4
    You could use simple `IndexOf` and `Substring` operations on the strings. However, you can also use a spoon to chop wood. It's not necessarily the best approach. – Mathias Lykkegaard Lorenzen Dec 16 '14 at 09:08
  • 1
    Indeed. Start with the proper tool, then when the requirements change in 2 weeks time, it'll be a simple change rather than a major rewrite. – James Thorpe Dec 16 '14 at 09:09
  • There is already a regex-free solution to that question here: http://stackoverflow.com/questions/1717611/regex-c-sharp-find-a-string-between-2-known-values – Mik Dec 16 '14 at 09:30

2 Answers2

2

Thanks for your feedback regarding the external tool. I will use it in future now, but for this problem I coded this function:

    private string ExtractBetweenBodyTags(string str1)
    {

        if ( ! string.IsNullOrEmpty(str1))
        {
            int p1 = str1.IndexOf("<body>\r\n");
                if (p1 >-1)
                {
                    string str2 = str1.Substring(p1 + "<body>\r\n".Length);
                    int p2= str2.IndexOf("\r\n</body>");
                    if (p2 > -1)
                    {
                        str2 = str2.Substring(0,p2-1 );
                        return str2;
                    }
                }
        }
        return "";
    }

And had no problems using it.

I think we can close that :)

Luca
  • 1,766
  • 3
  • 27
  • 38
  • 1
    If you solved your own problem, you can mark it as an accepted solution. It's encouraged to do so. – Kyle Muir Dec 16 '14 at 09:13
  • You _might_ want to use [`LastIndexOf`](http://msdn.microsoft.com/en-us/library/system.string.lastindexof%28v=vs.110%29.aspx) for finding `

    `, just in case there's a chance that it might be found elsewhere (inside cdata perhaps?), and also because it's more likely to be towards the end of the string, so may be more performant

    – James Thorpe Dec 16 '14 at 09:19
  • @JamesThorpe yess great, this was the last input I needed, I can use `LastIndexOf` And @Kyle Muir I can accept it just in 2 days. But I will. – Luca Dec 16 '14 at 09:24
1

With Regex:

 Regex regex = new Regex(@"(?<=<body>).*?(?=</body>)", RegexOptions.Singleline);
 string body = regex.Match(source).ToString();
Florian Schmidinger
  • 4,682
  • 2
  • 16
  • 28