3

I have a string as follows: string chart = "<div id=\"divOne\">Label.</div>;" which is generated dynamically without my control and would like to remove the text "Label." from the enclosing div element.

I tried the following but my regex knowledge still limited to get it working: System.Text.RegularExpressions.Regex.Replace(chart, @"/(<div[^>]+>)[^<]+(<\/div>)/i", "");

Thabiso Mofokeng
  • 681
  • 9
  • 20
  • 1
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Will May 11 '11 at 15:18

6 Answers6

2

Using LinqPad I got this snippet working. Hopefully it solves your problem correctly.

string chart = "<div id=\"divOne\">Label.</div>;";

var regex = new System.Text.RegularExpressions.Regex(@">.*<");

var result = regex.Replace(chart, "><");

result.Dump(); // prints <div id="divOne"></div>

Essentially, it finds all characters between the opposing angle brackets, and replaces it.

The approach you take depends on how robust the replacement needs to be. If you're using this at a more general level where you want to target the specific node, you should use a MatchEvaluator. This example produces a similar result:

string pattern = @"<(?<element>\w*) (?<attrs>.*)>(?<contents>.*)</(?<elementClose>.*>)";

var x = System.Text.RegularExpressions
    .Regex.Replace(chart, pattern, m => m.Value.Replace(m.Groups["contents"].Value, ""));

The pattern you use in this case is customizable, but it takes advantage of named group captures. It allows you to isolate portions of the match, and refer to them by name.

John Nelson
  • 5,041
  • 6
  • 27
  • 34
1

Try this for your regex:

<div\b[^>]*>(.*?)<\/div>

The following produces the output <div></div>

System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(@"<div\b[^>]*>(.*?)<\/div>");
Console.WriteLine(regex.Replace("<div>Label 1.</div>","<div></div>"));
Console.ReadLine();
NakedBrunch
  • 48,713
  • 13
  • 73
  • 98
1

Your regex looks good to me, (but don't specify the '/.../i' delimiters and modifier). And use '$1$2' as your replacement string:

var re = new System.Text.RegularExpressions.Regex(@"(?i)(<div[^>]+>)[^<]+(<\/div>)");
var text = regex.Replace(text, "$1$2");
ridgerunner
  • 33,777
  • 5
  • 57
  • 69
  • Works! To questions though, 1. what does the (?i) do?; and 2. how does $1$2 return the enclosing element without the text, I mean to ask why string.empty does not replace text only within? – Thabiso Mofokeng May 11 '11 at 15:50
  • The `(?i)` turns on `case-insensitive` mode. The `$1$2` says: _"replace the matching string with capture group 1 followed by capture group 2"_ We capture the DIV start tag in group 1 and the close tag in group 2. The contents are not captured and are thus, discarded, This is pretty basic stuff. See the tutorial at: [www.regular-expressions.info](http://www.regular-expressions.info/) for more details. Happy regexing! – ridgerunner May 12 '11 at 17:07
0

You must just write a pattern to select text in the div tag.

Regex.Replace(chart,yourPattern,string.empty);
Saleh
  • 2,982
  • 5
  • 34
  • 59
0

I'm a little confused by your question; it sounds like you are parsing through some pre-generated HTML and want to remove all instances of the value of chart that occur within in a <div> tag. If that's correct, try this:

"(<div[^>]*>[^<]*)"+chart+"([^<]*</div>)"

Return the first & second groupings concatenated together and you should have your <div> back sans chart.

lauren
  • 58
  • 6
0

Here is a better way than Regex.

var element = XElement.Parse("<div id=\"divOne\">Label.</div>");
element.Value = "";
var value = element.ToString();

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Will
  • 918
  • 5
  • 12