0

I am trying to escape markups and unicode charcaters from the string using the Regular expression like below

string b = Regex.Replace(StNameTag, "<[^>]+>|\u200B|\n|\t|\r", string.Empty);

But the issue is its not escape some characters like &#160; and some string even has

ED5D6EB4918943C197E874EF6414E351 .ExternalClass p.MsoNormal, ED5D6EB4918943C197E874EF6414E351 .ExternalClass li.MsoNormal, ED5D6EB4918943C197E874EF6414E351 .ExternalClass div.MsoNormal {margin-top:0in;margin-right:0in;margin-bottom:8.0pt;margin-left:0in;line-height:107%;font-size:11.0pt;font-family:"Calibri",sans-serif;}ED5D6EB4918943C197E874EF6414E351 .ExternalClass .MsoChpDefault {font-family:"Calibri",sans-serif;}ED5D6EB4918943C197E874EF6414E351 .ExternalClass .MsoPapDefault {margin-bottom:8.0pt;line-height:107%;}ED5D6EB4918943C197E874EF6414E351 .ExternalClass div.WordSection1 {}&#160;ABCD

I will need only ABCD from the above string. How can I escape them.

user4912134
  • 1,003
  • 5
  • 18
  • 47
  • Why not match any whitespace using `\s`? Use `@"<[^>]+>|[\u200B\s]"`. You may certainly just add `\u00A0` as an alternative that is a `NO-BREAK SPACE` char to your current regex. Or ` ` if it is just a literal substring. – Wiktor Stribiżew Sep 05 '18 at 17:27
  • @WiktorStribiżew I used `<[^>]+>|\u200B|\n|\t|\r|\u00A0` but I stil see ` ` – user4912134 Sep 05 '18 at 17:36
  • ` ` is an entity that when rendered is the `\u00A0` Unicode character. Also, strings in C# have to be escaped like this `"<[^>]+>|\\u200B|\\n|\\t|\\r|\\u00A0"` or a _verbatim_ string like this `@"<[^>]+>|\u200B|\n|\t|\r|\u00A0"` –  Sep 05 '18 at 19:41
  • This looks like an [XY problem](https://meta.stackexchange.com/a/66378/388741). It looks like you are trying to manipulate HTML using Regular expressions, which is a [regularly occurring source of questions here](https://meta.stackoverflow.com/questions/252396/regex-and-html-the-long-tail-annoys-me). For example, see [this question, and its accepted answer](https://stackoverflow.com/questions/1732348/). So if I'm right, do yourself a favour, and use an HTML parser (and try to avoid the XY problem). – Richardissimo Sep 05 '18 at 21:40

0 Answers0