0

I want to use regex in c# to match certain contents.

For example, the input string would be the following:

7687687toyi7fy
<body style="box-sizing: border-box; height: inherit; width: inherit; 
margin: 0px; overflow: hidden">
lkjlknkjyyugtfiytfif
</body>
</html>

Now I want to match whatever between <body ... hidden> and </body> So for the above example, I would want to match "lkjlknkjyyugtfiytfif"

I tried using the pattern <body(.+?)>(.+?)</body> but somehow it does not match anything.

For debugging, I also tried using <body(.+?)>, it matches the <body ... hidden> successfully, but then whatever I add after <body(.+?)>, I can not get what I want.

Any suggestion would be appreciated.

Thanks!

yhm
  • 3
  • 3
  • 4
    Don't use regex to parse html. Use an actual html parser library. – gunr2171 Jun 19 '18 at 18:55
  • 3
    Use [HTML Agility Pack](http://html-agility-pack.net/) or [AngleSharp](https://anglesharp.github.io/) – maccettura Jun 19 '18 at 18:56
  • 2
    I'm voting to close this question as off-topic because you can read here why HTML parsing using regex is impossible --> **https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags** – Peter B Jun 19 '18 at 18:58
  • Perhaps your issue is multiline - by default `.` doesn't match newline? Try replacing `.` with `(?:.|[\r\n])` - see [RegexOptions.SingleLine](https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regexoptions?view=netframework-4.7.2#System_Text_RegularExpressions_RegexOptions_Singleline) Show your code. – NetMage Jun 19 '18 at 19:03
  • Take a look at this: http://www.java2s.com/Book/CSharp/0220__Regular-Expressions/Parsing_an_XMLHTML_tag.htm – Bactos Jun 19 '18 at 19:23
  • @NetMage Thanks, I think that's the reason, I am not very familiar with regex, I am gonna read more about multiline. Thank you – yhm Jun 19 '18 at 19:26
  • @gunr2171, maccettura and Peter, Good suggestion. I know it's better. For now I am just doing some small task but I will definitely learn these library. Thank you – yhm Jun 19 '18 at 19:29
  • @Bactos, thanks, that helps – yhm Jun 19 '18 at 19:30
  • @yhm look at my example maybe its help – Hitesh Anshani Jun 19 '18 at 19:38

2 Answers2

0

Please Use Regular Expression: You will Get Your Result in first group

<body.*?>(.*?)<\/body>

Live Example:-

https://regex101.com/r/JLxRzU/2

Hitesh Anshani
  • 1,499
  • 9
  • 19
0

Since my comment helped I will post that here as well for others to see:

You can see this here

For ease of reading:

using System;
using System.Text.RegularExpressions;
class Program
{
    static void Main(string[] args)
    {

         string r = 
            @"<(?'tag'\w+?).*>"      +   // match first tag, and name it 'tag' 
            @"(?'text'.*?)"          +   // match text content, name it 'textd' 
            @"</\k'tag'>";               // match last tag, denoted by 'tag' 

         string text = "<h1>hello</h1>"; 

         Match m = Regex.Match (text, r); 
         Console.WriteLine (m.Groups ["tag"]);                // h1 
         Console.WriteLine (m.Groups ["text"]);               // hello 
    }
}
Bactos
  • 1,233
  • 14
  • 26