2

I have a large xml response from querying an api ~2,000,000 lines and I am unsure how to parse this in a timely manner. I only need to get the value of 2 elements for every node as I go through the file and save it to a hashtable. It has been an hour since running the code below and the script is not done parsing. Any ideas?

UPDATE: I found this question here on Stackoverflow How can i use XmlReader in PowerShell to stream big/huge XML files? It parses really fast.

Xml file excerpt

<Response>
<Node_List>
<Node>
<Element 1></Element 1>
<Element 2></Element 2>
<Element 3></Element 3>
<Element 4></Element 4>
<Element 5></Element 5>
<Element 6></Element 6>
<Element 7></Element 7>
<Element 8></Element 8>
</Node>
<Node>
</Node>
</Node_List>
</Response> 

Powershell Code

$hashTable = $null
$hashTable = @{}

[xml]$response = API query here

$elementCount = 
$response.SelectNodes("path/to/get/element/count").Count

for($i=0; $i -lt $elementCount; $i++)
{

[string] $elementOne = $response.SelectNodes("/path/to/first/element")[$i].InnerText
[string] $elementOne = $response.SelectNodes("/path/to/second/element")[$i].InnerText

try
{
$hashtable.Add($elementOne,$elementTwo)

}
catch
{
# This is to catch the exceptions that come up when duplicate values are added to the 
hashtable
}
  • Can you post a representative subset of the file contents? (eg. if you have a roo node with 2 million child nodes, post an excerpt with only a handful of child nodes) – Mathias R. Jessen Dec 17 '21 at 19:20
  • There are ~250,000 nodes each node has 8 elements in it – user1884185 Dec 17 '21 at 19:23
  • 3
    Loading such a large document into DOM will be very inefficient and resource-consuming. Using a pull-parser such as [`XmlReader`](https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmlreader?redirectedfrom=MSDN&view=net-6.0#examples) will be more efficient as it gives you the XML elements as soon as it has read them (contrary to DOM which requires to load the whole document into memory before you can parse it). PowerShell code sample without `async`: https://stackoverflow.com/a/26820939/7571258 – zett42 Dec 17 '21 at 19:48
  • What @zett42 said - `XmlReader` is likely the best option here. Can you show us how you make the API call? You might need to change that code to provide a stream object instead of the whole XML string at once (this could be achieved by writing the XML response to disk first) – Mathias R. Jessen Dec 17 '21 at 21:50
  • 1
    An example https://henry416.wordpress.com/2012/10/05/reading-xml-with-the-xmlreader-in-powershell/ – lit Dec 17 '21 at 22:23
  • @lit Thanks for the example – user1884185 Dec 18 '21 at 03:00
  • @MathiasR.Jessen It is a standard REST api call with Invoke-RestMethod. I download the least amount of data I can and funny enough it only takes 3 minutes to download – user1884185 Dec 18 '21 at 03:05
  • Does anyone have anymore examples of using XMlReader in Powershell? I cannot figure out how to use it with these random errors I am getting. Ex: ""'Text' is an invalid XmlNodeType" when trying to parse an element – user1884185 Dec 20 '21 at 14:24
  • @DonCruickshank Yes it does – user1884185 Dec 20 '21 at 22:02
  • For reference I was able to parse all 2,000,000 lines and extract values using the code in the Stackoverflow link I posted in 6 minutes – user1884185 Dec 20 '21 at 22:53

0 Answers0