Splitting a string and keeping the delimeters within the results

Question

Reading this question, it seems Regex is the solution to my problem.

This is the HTML I'm trying to split:

\n\t\t\t
    <td class=\"stats_name\">
        Damage \n\t\t\t

    <td class=\"stats_value\">
        53 \n\t\t\t

    <td class=\"stats_modifier\">
        (<span class=\"ability_per_level_stat\">+3.2 / per level</span>) \n\t\t\n\t\t  

    </td>

    </td>

    </td>

For my reasons, I need to split this on the <td string. This worked well enough with HtmlAgilityPack and String.Split, however the delimiter is removed and I need it present.

var statCells = rowDocument.DocumentNode.InnerHtml.Split(new string[] {"<td"}, StringSplitOptions.RemoveEmptyEntries).ToList();

And here's the same "function" using Regex to keep the delimeter, however it doesn't work as expected and is returning far too many strings, I think it's splitting on "<" "t" and "d" individually.

var statCells = Regex.Split(rowDocument.DocumentNode.InnerHtml, @"(?<=[<td])").ToList();

How can I use Regex.Split to split on "<td"?

What do you mean with split on td? How does this not work with the htmlAgilityPack? if you do: doc.DocumentElement.SelectNodes("td"), you will perfectly get each td node including their tagname — Polity, Dec 09 '11 at 02:24
@Polity: Try it! It doesn't work as you'd expect because these particular TD's don't have closing elements and the content is stretched to encompass everything until the end. :) — Only Bolivian Here, Dec 09 '11 at 02:25

score 2 · Accepted Answer · answered Dec 09 '11 at 02:07

@"(?<=[<td])" is splitting on every < t or d because that's how character classes work. Use this if you want the <td at the beginning of the next string (rather than the end of the last one):

@"(?=<td)"

This is going to be slower than the original solution though. If you use String.Split and just concatenate each string with <td then that should work the same way but faster because you don't use regexen.

Splitting a string and keeping the delimeters within the results

1 Answers1