0

See post: https://stackoverflow.com/questions/20657177/how-to-remove-div-tag-from-html-editor-contents-in-asp-net#=

The post I refer to contains an answer to remove all div tags from a string. When a string contains multiple div tags I only want to remove div tags from that string where div attribute 'width' is set to (for example) 100px

How can I adjust below regex to meet my requirement?

string divTag = "div";
        objNews.Article = Server.HtmlEncode(Regex.Replace(ckedi.Content.Trim().ToString(), "(</?)" + divTag + @"((?:\s+.*?)?>)", ""));

Thank you

Frederik

  • Did you self-mark your question as a duplicate of another I'm confused? Also, can you give us examples and can you post your code so we can see what you've tried, this way we can also help you understand what's not working. – ctwheels Feb 02 '18 at 21:00
  • Thanks for posting it, but you should add it to your question along with the expected outcome and anything you've tried :) – ctwheels Feb 02 '18 at 21:08
  • 1
    Your best bet is to use a parser. See [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) for more info – ctwheels Feb 02 '18 at 21:10
  • 2
    @ctwheels I don't understand why everyone is so against parsing HTML with regex. Most HTML parsers require valid HTML which is not uncommon on the web. If you need to parse a little data from HTML with regex I don't see a problem. – Tony Feb 02 '18 at 21:16
  • @Tony it's because it can very easily be broken. Take, for example `
    a
    `. Get `a`
    – ctwheels Feb 02 '18 at 21:20
  • 1
    There is no problem with Regular Expressions to act as HTML parsers in different scopes [unless you don't know what you are going to do](https://stackoverflow.com/a/4234491/1020526). @ctwheels – revo Feb 02 '18 at 21:26
  • @ctwheels I understand its not perfect. But whats your alternative if the HTML is broken and the parser wont pick it up? I've been in situations where I use regex to try to fix broken HTML before loading it into a parser. I agree ideally you want to load it into a parser but its not always that easy. – Tony Feb 02 '18 at 21:26
  • @Tony those are edge cases where regex should be used for HTML (and pretty much never in production code). This user never specified that the HTML code they have is malformed and so the correct way for this user to go about their problem is to use a parser. – ctwheels Feb 02 '18 at 21:28
  • @ctwheels what if you are just building a web-crawler that is picking up all kinds of crazy HTML on the web? That is not an uncommon application type at all, and could be a production app. – Tony Feb 02 '18 at 21:32
  • @Tony same logic applies. – ctwheels Feb 02 '18 at 21:35
  • @ctwheels I'm not sure what you mean by that. I know its a common opinion to always use a parser but that has always come with major drawbacks in my experience. I'm not sure what I'm missing here. – Tony Feb 02 '18 at 21:43
  • @Tony As @revo pointed out, you can *sometimes* use it, but it's better not to in the majority of cases. Again, try getting `a` from `
    a
    `. Much more challenging than you'd think.
    – ctwheels Feb 02 '18 at 22:02
  • @ctwheels Wouldn't that fail on a parser as well? – Derek Feb 02 '18 at 22:05
  • @Derek a parser would not fail there because it's parsing the value of `style` which is `content:'<'`. – ctwheels Feb 02 '18 at 22:06

0 Answers0