0

I would like to ask which Regex i can use in order to splits the text string by <math xmlns='http://www.w3.org/1998/Math/MathML'>....</math>

the the result will be:

enter image description here

the code is:

        var text = @"{(test&<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><plus></plus><cn>1</cn><cn>2</cn></apply></math>)|(<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><root></root><degree><ci>m</ci></degree><ci>m</ci></apply></math>&nnm)&<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><power></power><cn>1</cn><cn>2</cn></apply></math>#<math xmlns='http://www.w3.org/1998/Math/MathML'><set><ci>l</ci></set></math>}";
        string findTagString = "(<math.*?>)|(.+?(?=<math/>))";
        Regex findTag = new Regex(findTagString);
        List<string> textList = findTag.Split(text).ToList();

I have found a similar question at Using Regex to split XML string before and after match and i would like to ask for advice about the Regex expression

Thank you

Ori

Community
  • 1
  • 1
Ori
  • 115
  • 1
  • 2
  • 14
  • 3
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/ have much better explanations for things you need to know about parsing XML with regular expressions. Make sure to read at least top 20 answers carefully. – Alexei Levenkov Apr 14 '15 at 20:40

3 Answers3

0

after some tests, i think that this will do the work:

string findTagString = "(<math.*?></math>)|((.*){}()#&(.*))</math>";
Ori
  • 115
  • 1
  • 2
  • 14
0

Here is my attempt, based on a zero-length look-ahead and look-behind:

(?=<math[^>]*>)|(?<=</math>)

Code:

string findTagString = "(?=<math[^>]*>)|(?<=</math>)";
var text = @"{(test&<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><plus></plus><cn>1</cn><cn>2</cn></apply></math>)|(<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><root></root><degree><ci>m</ci></degree><ci>m</ci></apply></math>&nnm)&<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><power></power><cn>1</cn><cn>2</cn></apply></math>#<math xmlns='http://www.w3.org/1998/Math/MathML'><set><ci>l</ci></set></math>}";
Regex findTag = new Regex(findTagString);
string[] textList = findTag.Split(text);
Console.WriteLine(string.Join("\n", textList));

Output of a sample program:

{(test&                                                                                                                                                             
<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><plus></plus><cn>1</cn><cn>2</cn></apply></math>                                                            
)|(                                                                                                                                                                 
<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><root></root><degree><ci>m</ci></degree><ci>m</ci></apply></math>                                           
&nnm)&                                                                                                                                                              
<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><power></power><cn>1</cn><cn>2</cn></apply></math>                                                          
#                                                                                                                                                                   
<math xmlns='http://www.w3.org/1998/Math/MathML'><set><ci>l</ci></set></math>                                                                                       
}     
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

I would advise against trying to use regular expressions with XML. XML is not a regular language and thus not fitting for regular expressions. Anyway .NET gives such convenient tools for parsing XML that I really don't see the point.

My suggestion is that you use LINQ to XML instead of regexs.

Motti
  • 110,860
  • 49
  • 189
  • 262
  • Hi Motti, thank you for your advice. the input i get is an xml string. it means that it is not a valid xml and i cannot use any parse xml in the code. the only way to do what i need to do is to use Regex or complex substring. Regex is easier and faster. – Ori Apr 17 '15 at 08:10