Powershell - XML - How do I extract multiple values from various depths per node family

Question

I've got several million lines of xml to parse. For one application I am looking to extract 3 pieces of data for use in other scripts.

The xml looks something like the following (several dozen tags have been removed per grouping) I can change one of the name tags if it helps; though not desirable it will require some intermediate processing. Not all node groups have the extended attributes.

<?xml version="1.0" encoding="IBM437"?>
<topo>
    <node>
        <name>device1Name</name>
         <extendedAttributes>
            <attribute>
                <name>tagCategoryName</name>
                <value>tagValue</value>
            </attribute>
        </extendedAttributes>
     </node>
    <node>
        <name>device2Name</name>
        <extendedAttributes>
            <attribute>
                <name>tagCategoryName</name>
                <value>tagValue</value>
            </attribute>
        </extendedAttributes>
    </node>
    <node>
        <name>device3Name</name>
    </node>
...
...
</topo>

The output I am looking for each node is

deviceName   tagCategoryName   tagValue

I've attempted several approaches and have been unable to find an elegant solution. Started with

$xml = [xml](get-content prodnodes.txt)

Tried some Select-Xml with xpath, with direct $xml.topo.node addressing piping to select object using property names. I was unable to target the names effectively with the following.

$xml.topo.node | select-object -property name, extendedAttributes.attribute.name, extendedAttributes.attribute.value

It would return only the name The following worked to get me an additional attribute but I couldn't extend it without issues.

$munge = $xml.topo.node | select-object -property name, {$_.extendedAttributes.attribute.name}

Attempting to extend it looked like this

$munge = $xml.topo.node | select-object -property name, {$_.extendedAttributes.attribute.name, $_.extendedAttributes.attribute.value}

which gave output like this

deviceName1   {tagCategoryName1, tagValue1}
deviceName2   {tagCategoryName1, tagValue2}
deviceName3   {$null, $null}
deviceName4   {tagCategoryName2, tagValue3}
...
...

Is there a way to clean this up, or another approach that is more effective?

score 0 · Accepted Answer · answered Jan 14 '20 at 23:35

Your first approach was nearly correct. That being said, in order to dig into properties like that, you need to use calculated properties.

Calculated properties are represented by a hashtable containing a name element, which will be your column name, and an expression element that contain a scriptblock to do something more than what you can with a simple select.

Here's how you need to do it in your scenario.

The statement

$xml.topo.node | select-object -property name, 
@{'Name' = 'TagName' ; 'Expression' = { $_.extendedAttributes.attribute.name } },
@{'Name' = 'TagValue' ; 'Expression' = {$_.extendedAttributes.attribute.value}}

The result

name        TagName         TagValue
----        -------         --------
device1Name tagCategoryName tagValue
device2Name tagCategoryName tagValue
device3Name

More information on this topic

Microsoft - Select-Object

4sysops - Add a calculated property with select object in powershell

Powershell - XML - How do I extract multiple values from various depths per node family

1 Answers1