2

I have code that takes this string and parses it into an array of characters:

var textArray = Regex.Replace(text, @"</?span( [^>]*|/)?>",    
String.Empty).Trim().ToCharArray();

<span>そ</span><span>れ</span><span>に</span><span>も</span>拘<span>わ</span><span>ら</span>もも<span>ず</span>

But now I need to do something different and I am not sure how to go about this. What I need is to parse a string like this into an array like this:

そ
れ
に
も
拘
わ
ら
もも
ず

Where anything in between <span> and </span> is an element in the array and also anything in between </span> and <span>.

Would appreciate any advice anyone could offer on how I can use Regex to do this:

Alan2
  • 23,493
  • 79
  • 256
  • 450

1 Answers1

0

You may use

var textArray = Regex.Split(text, @"(?:</?span(?:\s+[^>]*)?>)+")
    .Where(x => !string.IsNullOrEmpty(x));

The Regex.Split method will split a string into chunks by the matching string occurrences. If the match occurs at the string boundary, empty items are added to the result, hence you need to use the .Where(x => !string.IsNullOrEmpty(x)).

The regex matches 1 or more occurrences of

  • < - < char
  • /? - an optional /
  • span - span text
  • (?:\s+[^>]*)? - an optional sequence of 1+ whitespaces and then 0 or more chars other than >
  • > - a > char.

See the regex demo

The non-capturing group ((?:...)) is important as Regex.Split will also add all captured substrings to the result.

Alternatively, if you want to only grab all texts in between span open/close tags:

var textArray = Regex.Matches(text, @"(?s)<span(?:\s+[^>]*)?>(.*?)</span>")
        .Cast<Match>()
        .Select(x => x.Groups[1].Value);

See the C# demo.

Here, <span(?:\s+[^>]*)?> matches span and all the inner part of the tag, and (.*?) captures the inner text and </span> matches the close tag.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563