Suppose I have a large XML (200 - 1000+ MB) and I'm just looking to get a very small subset of data in the most efficient way.
Given a great solution from one of my previous questions, I ended up coding a solution to use an XMLReader mixed with XMLDocument / XPath.
So, supposing I have the following XML:
<Doc>
<Big_Element1>
... LOTS of sub-elements ...
</Big_Element1>
.....
<Small_Element1>
<Sub_Element1_1 />
...
<Sub_Element1_N />
</Small_Element1>
.....
<Small_Element2>
<Sub_Element2_1 />
...
<Sub_Element2_N />
</Small_Element2>
.....
<Big_ElementN>
.......
</Big_ElementN>
</Doc>
And all I really need is the data from the Small_Element
s and the Big_Element
s are definitely very large (with many small sub-elements within them) and, so, I'd like to not even enter them if I don't have to.
I came up with this form of solution:
Dim doc As XmlDocument
Dim xNd As XmlNode
Using reader As XmlReader = XmlReader.Create(uri)
reader.MoveToContent()
While reader.Read
If reader.NodeType = XmlNodeType.Element Then
Select Case UCase(reader.Name)
Case "SMALL_ELEMENT1"
doc = New XmlDocument
xNd = doc.ReadNode(reader)
GetSmallElement1Data(xNd)
Case "SMALL_ELEMENT2"
doc = New XmlDocument
xNd = doc.ReadNode(reader)
GetSmallElement2Data(xNd)
End Select
End If
End While
End Using
And GetSmallElement1Data(xNd)
& GetSmallElement2Data(xNd)
are easy enough for me to deal with since they're small and so I use XPath within them to get the data I need.
But my question is that it seems this reader still goes through the entire XML rather than just skipping over the Big_Element
s. Or is it not / this the correct way to have programmed this??
Also, I know this sample code was written in VB.net, but I'm equally comfortable with c# / VB.net solutions.
Any help / thoughts would be great!!!
Thanks!!!