1

I have strings that look like this:

 "<span>X</span>間違<span>う</span><span>ABCDE</span>"

How can I add spans to the elements that do not have spans already so the string looks like this:

 "<span>X</span><span>間</span><span>違</span><span>う</span><span>ABCDE</span>"

Is this something that I can do with Regex?

Example 2 source

"<span>X</span>A<span>う</span>ABC<span>Y</span>"

Example 2 result

"<span>X</span><span>A</span><span>う</span><span>A</span><span>B</span><span>C</span><span>Y</span>" 

Example 3 source:

"間違<span>う</span>"

Example 3 result:

"<span>間</span><span>違</span><span>う</span>

Example 4 source:

"<span>う</span>間違"

Example 4 result:

"<span>う</span><span>間</span><span>違</span>"

Please note, it's only the characters that do not have a span that I need to add spans to each of. I hope it makes sense. So in the first case "ABCDE" needs to stay as "ABCDE".

Alan2
  • 23,493
  • 79
  • 256
  • 450
  • 1
    Did you try `Regex.Replace`? – Wiktor Stribiżew Nov 03 '19 at 21:59
  • @WiktorStribiżew - I have used Regex.Replace before for simple needs but this case I am not sure how I can use it because I need each character inside of 間違 to be surrounded by a new span. Not just the "間違". Note that these could be different characters also. – Alan2 Nov 03 '19 at 22:19
  • 1
    It is easier than that. You need to wrap each char outside of `[^<]*` with span tags. – Wiktor Stribiżew Nov 03 '19 at 22:55
  • @WiktorStribiżew - Can you give an example as an answer. I tried the other two solutions but both have some problems and don't work for the data that I have. – Alan2 Nov 03 '19 at 22:57
  • Why is the result for `"間違" ` not `間違`? – tymtam Nov 03 '19 at 23:03
  • Unfortunately that's the requirement I've been given and I confirmed that was the case. Each character not in a span needs to be place in its own span. Not combined into one span. – Alan2 Nov 03 '19 at 23:15
  • A universal but slightly inefficient solution is `Regex.Replace(text, @"(?s)(]*)?>.*?)|\P{M}\p{M}*", x => x.Groups[1].Success ? x.Groups[1].Value : $"{x.Value}")` – Wiktor Stribiżew Nov 03 '19 at 23:35
  • @WiktorStribiżew - could you add this as an answer so I can review and accept if it works. No problem if it's inefficient as the code will only execute once every few seconds for 10,000 times in total. – Alan2 Nov 04 '19 at 06:53

3 Answers3

1

(Updated in the light of the new examples)

Regex will fail for html. Please see RegEx match open tags except XHTML self-contained tags

I've been warned, I want to use regex for html

Something like this could do the job.

Regex.Replace(input, "(^|</span>)(.*?)(<span>|$)", "$1<span>$2</span>$3");

Please note that this will not split words are not wrapped in spans; it will just wrap them in spans. Since words that are already wrapped in spans are not split this seems reasonable.


Test

string input = "間違<span>う</span>X<span>ABC</span>Y<span>DEF</span>GHI";

Console.WriteLine(input);
var replaced = Regex.Replace(input, "(^|</span>)(.*?)(<span>|$)", "$1<span>$2</span>$3");

Console.WriteLine(replaced);
間違<span>う</span>X<span>ABC</span>Y<span>DEF</span>GHI
<span>間違</span><span>う</span><span>X</span><span>ABC</span><span>Y</span><span>DEF</span><span>GHI</span>
tymtam
  • 31,798
  • 8
  • 86
  • 126
1

Since the string you process is not actually HTML and just plain text with non-nested span tags, the problem can be solved with regex while treating <span> and </span> as starting and ending delimiters.

You may capture and keep the text between two tags and match any other char in other contexts:

var pattern = @"(?s)(<span(?:\s+[^>]*)?>.*?</span>)|\P{M}\p{M}*";
var result = Regex.Replace(text, pattern, x => 
    x.Groups[1].Success ? x.Groups[1].Value : $"<span>{x.Value}</span>");

The pattern will become more efficient if you replace .*?</span> with [^<]*(?:<(?!</span>)[^<]*)*:

var pattern = @"(<span(?:\s+[^>]*)?>[^<]*(?:<(?!/span>)[^<]*)*</span>)|\P{M}\p{M}*";

Details

  • (<span(?:\s+[^>]*)?>[^<]*(?:<(?!/span>)[^<]*)*</span>) - Group 1: matches and captures a
    • <span - a literal substring, then
    • (?:\s+[^>]*)?> - an optional 1+ whitespaces followed with 0+ chars other than >
    • [^<]* - 0+ chars other than < followed with
    • (?:<(?!/span>)[^<]*)* - 0 or more occurrences of < not followed with /span> and then any 0+ chars other than < and then
    • </span> - </span> text
  • | - or
  • \P{M}\p{M}* - any Unicode grapheme.

The x.Groups[1].Success ? x.Groups[1].Value : $"<span>{x.Value}</span>") logic reverts Group 1 value if Group 1 participated in the match, else, wraps the matched char with span tags.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

You can strip the tags to get the plain text, then add the tags to each character.

Example :

    var span = "<span>X</span>間違<span>う</span><span>Y</span>";

    var plain = span.Replace("<span>", "").Replace("</span>", "").Trim();

    var sb = new StringBuilder(string.Empty); 

    for(int x =0; x < plain.Length; x++)
    {
        sb.Append($"<span>{plain[x]}</span>");

    }
iSR5
  • 3,274
  • 2
  • 14
  • 13