RegEx cannot match when their are 0's using [^\000]*?

Question

Good Day,

Is there any alternative in getting everything inside a tag using regex. here is my code:

   MatchCollection matches = Regex.Matches(chek, "<bib-parsed>([^\000]*?)</bib-parsed>");

here is the sample input:

   <bib-parsed>
   <cite>
   <pubinfo>
   <pub-year><i>1984</i></pub-year>
   <pub-place>Albuquerque</pub-place>
   <pub-name>Maxwell Museum of Anthropology and the University of New Mexico Press        </pub-name>
   </pubinfo>
   <bkinfo>
   <btl>The Galaz Ruin: A Prehistoric Mimbres Village in Southwestern New Mexico</btl>
   </bkinfo>
   </bib-parsed>

that sample above will be matched but when there are "0's inside the pubyear like "2001" the matching fails. any alternative for this? thanks

Noooooooooooooooooooooooooooo! http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not, obligatory: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 I sense the pit opening... — Mitch Wheat, Oct 18 '13 at 02:01

Joshua Honig · Accepted Answer · 2013-10-18T12:21:02.450

6

It appears your input is valid XML. If this is the case, use the XML parsers in either System.Xml or System.Xml.Linq. They are extremely fast. For an input string containing multiple chunks like your example, using the System.Xml.Linq namespace objects:

var bibChunks = XDocument.Parse(yourXmlString)
                         .Descendants("bib-parsed")
                         .Select(e => e.Value);

foreach(string chunk in bibChunks) {
    // do stuff
}

That's all there is to it.

edited Oct 18 '13 at 12:21

answered Oct 18 '13 at 02:17

Joshua Honig

12,925
8
53
75

Hi.. im encountering a problem. this for well formed xml file? but im putting errors on it the compilation fails. – Erick Reyes Oct 18 '13 at 06:54
@ErickReyes Sorry I misspelled `Descendants`. I've corrected it now. – Joshua Honig Oct 18 '13 at 12:21

RegEx cannot match when their are 0's using [^\000]*?

1 Answers1