-1

I have string that contains HTML text:

string html = "<html>   \r\n\n<body> \n<h1>Hello World</h1>    \r \n \n</body> </html>"; 

I want to clean the text outside the HTML tag. So the string will come out looking like this:

string html = "<html><body><h1>Hello World</h1></html>"; 

I am also using .NET. Is there any built in functionality to do this?

andrefadila
  • 647
  • 2
  • 9
  • 36
  • 2
    possible duplicate of [C# how to Regex.Replace "\r\n" (the actual characters, not the line break)](http://stackoverflow.com/questions/4311882/c-sharp-how-to-regex-replace-r-n-the-actual-characters-not-the-line-break) – zod Aug 07 '14 at 21:04
  • Agree with @zod, you might not need `RegEx`, but in addition you will like to remove tabs as well `\t`, the same way as it is written for `\r` and `\n`. – Rolice Aug 07 '14 at 21:06
  • thanks for the link @zod, I am trying – andrefadila Aug 07 '14 at 21:12
  • `/\s/g` will match all whitespace, or you can specify a character class: `/[ \r\n]/g` – Sam Aug 07 '14 at 21:58

1 Answers1

0
Regex REGEX_TAGS = new Regex(@">/\r?\n|\r/|\s+<", RegexOptions.Compiled);
string html = "<html>   /r/n/n<body> \n<h1>Hello World</h1>    \r \n \n</body> </html>"
html = REGEX_TAGS.Replace(html, "><");
Rumplin
  • 2,703
  • 21
  • 45
  • I got a "feeling" that `Hello World` will become `HelloWorld`... You should not remove spaces or at least change `+` with `{2,}` to catch only pairs of spaces and replace them with single space. – Rolice Aug 07 '14 at 21:07
  • Your feeling is wrong. The Regex looks for blank spaces and "Hello World" is not blank. – Rumplin Aug 07 '14 at 21:12
  • Agree, but it would have not catch `> Hello world <` in the previous version of your answer. – Rolice Aug 07 '14 at 21:17
  • everything that is in the >< tags and that's not empty, is user content, maybe the user wants those spaces? – Rumplin Aug 07 '14 at 21:21
  • 1
    You know that browsers would not display more than 1 even if they are 20 one after another (` ` is not ` `), so why would we keep them? Do not get me wrong, I am trying to improve your answer against the question. – Rolice Aug 07 '14 at 21:25