I have an XML file that is aranged like so:
<xml:head>
<xml:reportObject>
<xml:device>
<device:id>
<id:value = value />
</device:id>
<device:OpAttributes>
<OpAttributes:value = value />
<device:OpAttributes>
<device:Config>
<Config:NetConfig>
<NetIF:ID = value />
<NetHost>
<NetHost:MAC = value />
</NetHost>
</Config:NetConfig>
</device:Config>
<device:Role = value />
<device:TaggedString name="value" value="value" />
<device:Addition junk ........ />
</xml:device>
</xml:reportObject>
Lather, Rinse, Repeat for several instances on reportObjects
</xml:head>
My problem is that I am trying to parse out three values (specifically the "NetHost:MAC", "device:Role" and the "device:TaggedString" values) to dump into their place in a database column.
The program we use is an in-house tool that will do this based on RegEx matches, but because the XML flatlines after the "xml:device" tag, I am left searching for a way to match everying withing the "xml:device" tags to continue further parsing... the kicker is that I can only continue to parse if the "device:Role" tag is a client. Anything else gives too much junk and my parsing bombs.
My most recient attempt (and subsequent failure) to do this looks like this:
<xml:device([\s\S]+?(\b\w*Client\w*\b))</xml:device>
This works for 90% of my matches, but somewhere within the file, the [\s\S]+? is matching too far down, due to lack of an earlier match, and still making my parsing bomb.
Any help will keep me from pulling the rest of my hair our.
RegEx is the only option I have to do this parsing at the moment via our in-house tool. If you can think of something different, please let me know.