How I can remove characters between < and > using regex in c#?

Question

I have a string str="<u>rag</u>". Now, i want to get the string "rag" only. How can I get it using regex?

My code is here..

I got the output=""

Thanks in advance..

C# code:

string input="<u>ragu</u>";
string regex = "(\\<.*\\>)";
string output = Regex.Replace(input, regex, "");

is it html ? or simple text ? – Pradip Apr 10 '13 at 12:15 — Pradip, Apr 10 '13 at 12:15
its better if you use HTML AGILITY PACK. – Pradip Apr 10 '13 at 12:20 — Pradip, Apr 10 '13 at 12:20

score 8 · Answer 1 · answered Apr 10 '13 at 12:16

8

const string HTML_TAG_PATTERN = "<.*?>";
Regex.Replace (str, HTML_TAG_PATTERN, string.Empty);

answered Apr 10 '13 at 12:16

vborutenko

4,323
5
28
48

+1 for being the first to come up with this simple non-greedy expression. – Floris Apr 10 '13 at 12:24

score 4 · Accepted Answer · edited Nov 28 '17 at 22:48

Using regex for parsing html is not recommended

regex is used for regularly occurring patterns.html is not regular with it's format(except xhtml).For example html files are valid even if you don't have a closing tag!This could break your code.

Use an html parser like htmlagilitypack

WARNING {Don't try this in your code}

To solve your regex problem!

<.*> replaces < followed by 0 to many characters(i.e u>rag</u) till last >

You should replace it with this regex

<.*?>

.* is greedy i.e it would eat as many characters as it matches

.*? is lazy i.e it would eat as less characters as possible

score 0 · Answer 3 · answered Apr 10 '13 at 12:16

0

Sure you can:

   string input = "<u>ragu</u>";
    string regex = "(\\<[/]?[a-z]\\>)";
    string output = Regex.Replace(input, regex, "");

answered Apr 10 '13 at 12:16

Piotr Stapp

19,392
11
68
116

score 0 · Answer 4 · answered Apr 10 '13 at 12:17

0

You don't need to use regex for that.

string input = "<u>rag</u>".Replace("<u>", "").Replace("</u>", "");
Console.WriteLine(input);

answered Apr 10 '13 at 12:17

Soner Gönül

97,193
102
206
364

score 0 · Answer 5 · edited May 23 '17 at 11:52

Your code was almost correct, a small modification makes it work:

 string input = "<u>ragu</u>";
 string regex = @"<.*?\>";
 string output = Regex.Replace(input, regex, string.empty);

Output is 'ragu'.

EDIT: this solution may not be the best. Interesting remark from user the-land-of-devils-srilanka: do not use regex to parse HTML. Indeed, see also RegEx match open tags except XHTML self-contained tags.

How I can remove characters between < and > using regex in c#?

5 Answers5

Linked