0

I need reg expression to find the English text from Chinese and add a class for it.

Example: Input

<p>当然,你要学习<a href='#' target='_blank'>“<b>Megento</b>”</a></p>

Output Should be:

<p>当然,你要学习<a href='#' target='_blank'>“<b><span class="english">Megento</span></b>”</a></p>
Richard
  • 106,783
  • 21
  • 203
  • 265
Cader
  • 23
  • 3

1 Answers1

0

.NET regular expressions can match based on Unicode character ranges (see Unicode Category or Unicode Block: \p{}). For example the regex \p{IsBasicLatin} will match x, but not Ǝ (U+018E: Latin Capital Letter Reversed E).

Using this to match the text content of elements is therefore quite possible.

But don't use regex to parse the HTML itself. Use an HTML parser to process the HTML and then the regex to look at the text content.

Community
  • 1
  • 1
Richard
  • 106,783
  • 21
  • 203
  • 265