0

I'm working on a self proclaimed cool project where one would extract Xml data with Sql syntax.

<?ml version="1.0" encoding="utf-8"?>
<data attr1='some data' attr2='some data'>
    <personalData>
        <name>Mario</name>
        <lastName>Legenda</lastName>
        <birthData>
            <day>18</day>
            <month num="06">june</month>
            <godina>1986</godina>
        </birthData>
        <sex>M</sex>
        <death>N/A</death>
        <OIB>569874125369</OIB>
        <JMBG>25698745212</JMBG>
        <misc>
            <employed>n</employed>
            <student>n</student>
            <intelligence>n</intelligence>
            <tolerant>n</tolerant>
            <specialPowers>n</specialPowers>
            <married>n</married>
            <relationshipStatus>n</relationshipStatus>
            <socialLife>n</socialLife>
        </misc>
    </presonalData>
</data>

To fetch the entire 'data' tag, sql would be SELECT data FROM path/to/file/data.xml . After certains classes verify if the syntax is correct, the data fetching starts.

I want to do this project with regex, not with Dom, SimpleXml or other beacuse i wish to learn regex better. So... I'm trying to evaluate if the 'data' tag in the specified xml exists. I do this with...

 preg_match('#<data\s?([\w]+=[\w]+\s?)+?>#i', $XmlAsString, $match);

The ?operator is giving me trouble. It doesn't seem to know that \s is an empty space. so he's only giving me the attr2 attribute in the $match array.

Balayesu Chilakalapudi
  • 1,386
  • 3
  • 19
  • 43
Mario Legenda
  • 749
  • 1
  • 11
  • 24
  • Instead of `\s` I believe you need to use `\\s`. Same for other backslashes. – mah Mar 18 '14 at 11:30
  • 1
    In general, this might be of interest in your situation: http://stackoverflow.com/questions/8577060/why-is-it-such-a-bad-idea-to-parse-xml-with-regex – anderas Mar 18 '14 at 11:30
  • 1
    Of possible relevance and internet fame: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – jonrsharpe Mar 18 '14 at 11:31
  • Thanks. Moving on to Dom, SimpleXml and others – Mario Legenda Mar 18 '14 at 11:41

1 Answers1

0

It will be a very cool project if you can find a way to do this, because established computer science theory says that XML is not a regular language (its definition is recursive) and therefore it cannot be parsed by means of regular expressions. If you make this work, you will have disproved a fundamental theorem of computer science.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • For starters, i will use XmlReader and SimpleXml just to see it work. BUT, now that you said what you said, challenge accepted. It will take a long time, though, but it's not like I have a social life. – Mario Legenda Mar 18 '14 at 15:53