0

I have some text in a C# [WebMethod] as such:

string myText = "<item>One</item><item>Two</item><item>Three</item>";

I wish to split them into an array (myArray) with the following string on each of the indices:

myArray[0] = <item>One</item>
myArray[1] = <item>Two</item>
myArray[2] = <item>Three</item>

This is how I am trying to achieve that:

string[] myArray = Regex.Split(myText, "</item><item>");

The problem in this is, I get this undesirable result:

myArray[0] = <item>One
myArray[1] = Two
myArray[2] = Three</item>

Which clearly looks like it is excluding the criteria I used to split myText, from the resultant array elements.

I have also tried:

string[] myArray = Regex.Split(myText, "$1" + "</item><item>" + "$2");

This one doesn't even split the text. I am open to suggestions on any different method to handle this too.

Additional info per suggestions in comments

I will be storing these 'items' as nodes in a BaseX DB. The problem I had with BaseX is that the 'insert into node...' XQUERY for BaseX is only good for inserting one node/item (as far as I know). So my plan here is to store all the items in an array and loop through each of them to run a BaseX XQUERY for each node/item separately. I hope I was clear :P

  • 2
    Why are you trying to do this? Regex is a particularly poor choice for parsing xml. Give up and use the right tool for the job. Why not use an xml parser instead? XDocument would be an excellent choice. – spender Jan 01 '15 at 23:18
  • Well even if you're going to use Regex, I think you should split on only not the pair – niceman Jan 01 '15 at 23:28
  • After some consideration, I wielded my dupehammer here because you're approaching this wrong. You're actually trying to parse a bunch of xml fragments, and the correct way to do this is provided in the linked question. Regex is not an XML parser. Period. – spender Jan 01 '15 at 23:28
  • 1
    but if you split by the output will be: One Two Three which you can add to the end of them, but maybe you should consider spender's approach instead. – niceman Jan 01 '15 at 23:31
  • @spender I will be storing these 'items' as nodes in a BaseX DB. The problem I had with BaseX is that the 'insert into node...' XQUERY for BaseX is only good for inserting one node/item (as far as I know). So my plan here is to store all the items in an array and loop through each of them to run a BaseX XQUERY for each node/item separately. I hope I was clear :P – supersophisticated Jan 01 '15 at 23:32
  • @supersophisticated I suggest to add this to the question. – niceman Jan 01 '15 at 23:33
  • Yep... would probably be worth a re-open if you provide more context... You don't/shouldn't need to use regex for this. – spender Jan 01 '15 at 23:33
  • @L.B, thanks for pointing it out(or back to my earlier post). Infact, I was aware of the suggestion made to use XDocument from that post. My choice of Regex was based on other questions from here that I researched on before working on mine. Since I've never used XDocument, it involves some learning curve, past which I will implement that approach. Doing it right is the way to go, I understand. And yes, clear! – supersophisticated Jan 01 '15 at 23:54

1 Answers1

-1

This should give you what you want:

string[] myArray = Regex.Matches(myText, "<item>.*?</item>").Cast<Match>().Select(m => m.Value).ToArray();
snow_FFFFFF
  • 3,235
  • 17
  • 29
  • what's up with the down vote? It does exactly what what asked in the question. I see there has been some debate as to whether he should be trying to do this, but that doesn't change the accuracy of the answer. – snow_FFFFFF Jan 01 '15 at 23:54
  • While it wasn't me who downvoted, it's always a good idea to steer people away from a solution to their [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) because the "correct" answer to an XY problem is not the correct answer to their actual problem. Supplying such an answer does not enrich the OP's skills... rather it provides a means to perpetuate bad practice. – spender Jan 02 '15 at 01:42
  • I could argue that your assumptions might be short-sighted. You assume you know what he is trying to do based on a couple of lines of code. It sure looks like xml, but you have no idea where it is coming from and why he needs to split the values. You are probably right, but someone else might need to split a similar, non-xml pattern and won't be able to use this as an example. – snow_FFFFFF Jan 02 '15 at 02:23