How can I get the list of possible XPath queries for an xml object in PowerShell?
-
3Please supply a [mcve]. (if you just type '$xml' PowerShell should show you the root property. Anyways, it is probably easier to use PowerShell XML dot notation) – iRon Nov 09 '22 at 10:42
-
1Please show a sample XML document – Mathias R. Jessen Nov 09 '22 at 11:01
-
2The list of *possible* queries is infinite; XPath is a pretty flexible language that is not restricted to just querying by element name. As Ron mentions, autocomplete with dot notation is a very convenient way of exploring the data. As a bonus, this disregards namespaces, while XPath is very much respectful of them: `([xml]'
a ')|select-xml "/x"` will yield nothing, unlike `([xml]'a ').x`. – Jeroen Mostert Nov 09 '22 at 12:44 -
@iRon I skipped the rss tag while reading the xml file, that was the problem. Still, my question is about the listing of possible XML paths – Gergely Nov 09 '22 at 13:14
-
1All (recursive) XML paths??? or at a certain level (e.g. `$xml.rss`). *My* question (to be able to better help you) is still: please supply a [mcve]. – iRon Nov 09 '22 at 13:46
-
1You'll still have to narrow down your question to what it really is you're after; as it stands it's unanswerable. Any node can be selected by very many queries that end up referring to the same node ("the third element", "the first non-empty element", "the first element that has a child named Bob" and "the element named Alice" are all expressible in XPath and could all be the same element). If you just want to visualize the XML document's structure I suggest writing it to a file and using one of the many XML editors out there (or even just your browser). – Jeroen Mostert Nov 09 '22 at 14:52
1 Answers
As the comments note, it is impossible to list all possible XPath queries for a given XML document, given the complexity of this open-ended query language, with different ways to target the same nodes, ...
However, it is possible and may be useful to output XPath path expressions to the leaf elements of a document, so as to get a sense of the document structure, and to be able to formulate XPath queries based on them.
Assuming that helper function Get-XmlElementPath
is defined (source code below), you can do something the following:
# Sample XML doc.
$xmlDocText = @'
<?xml version="1.0"?>
<doc>
<catalog>
<book id="bk101">
<title>De Profundis</title>
</book>
<book id="bk102">
<title>Pygmalion</title>
</book>
</catalog>
<foo>
<bar>one</bar>
<bar>two</bar>
</foo>
</doc>
'@
Get-XmlElementPath $xmlDocText
This outputs the following strings, representing the XPath path expressions that select the document's leaf elements:
/doc/catalog/book[@id="bk101"]/title
/doc/catalog/book[@id="bk102"]/title
/doc/foo/bar[1]
/doc/foo/bar[2]
Note:
Caveat: The function does not (fully) support namespaces - while elements with explicit namespace prefixes are reported as such, those implicitly in a namespace are reported by their name only; if the input document uses namespaces and you want to query it based on the path expressions returned, you'll need to:
- Create a namespace manager with self-chosen prefixes to refer to the namespace URIs, including the default one.
- Use these prefixes in the XPath path expression, even for elements that are in the default namespace.
- The following answers demonstrate these techniques:
- In the context of the
.SelectNodes()
and.SelectSingleNode()
.NET API methods: see this answer. - In the context of the
Select-Xml
cmdlet: see this answer.
- In the context of the
Only element nodes are considered, and only leaf elements, i.e. those elements that themselves do not have any element children.
If a given child element has an
"id"
or"name"
attribute, its path is represented with an XPath conditional ([@id="..."]
or[@name="..."]
;"id"
takes precedence), under the assumption that these values are unique (at least among the sibling elements).Multiple child elements with the same name that do not have
"id"
or"name"
attributes are each represented by their 1-based positional index (e.g,[1]
).
Get-XmlElementPath
source code; run Get-XmlElementPath -?
for help:
function Get-XmlElementPath {
<#
.SYNOPSIS
Outputs XPath paths for all leaf elements of a given XML document.
.DESCRIPTION
Leaf elements are those XML elements that have no element children.
If a given child element has an "id" or "name" attribute, its path is
represented with an XPath conditional ([@id="..."] or [@name="..."])
Multiple child elements with the same name that do not have "id" or "name"
attributes are each represented by their 1-based positional index.
Note: Namespaces are NOT (fully) supported: while elements with
explicit namespace prefixes are reported as such, those
that are implicitly in a namespace are reported by name only.
.EXAMPLE
Get-XmlElementPath '<catalog><book id="bk101">De Profundis</book><book id="bk102">Pygmalion</book></catalog>'
/catalog/book[@id="bk101"]
/catalog/book[@id="bk102"]
#>
param(
[Parameter(Mandatory)] $Xml, # string, [xml] instance, or [XmlElement] instance
[Parameter(DontShow)] [string] $Prefix, # used internally
[Parameter(DontShow)] [string] $Index # used internally
)
if ($Xml -is [string]) {
$Xml = [xml] $Xml
}
if ($Xml -is [xml]) { $Xml = $Xml.DocumentElement}
# Construct this element's path.
$Prefix += '/' + $Xml.psbase.Name # !! .psbase.Name must be used to guard againts a "name" *attribute* preempting the type-native property.
if ($Index) { $Prefix += '[{0}]' -f $Index }
$childElems = $Xml.ChildNodes.Where({ $_ -is [System.Xml.XmlElement]})
if ($childElems) {
# Create a hashtable that maps child element names to how often they occur.
$htNames = [hashtable]::new() # Note: case-*sensitive*, because XML is.
foreach ($name in $childElems.get_Name()) { $htNames[$name]++ }
# Create a hashtable that maintains the per-name count so far in the iteration.
$htIndices = [hashtable]::new()
# Iterate over all child elements and recurse.
foreach ($child in $childElems) {
$Index = ''
if ($htNames[$child.psbase.Name] -gt 1) { $Index = ++$htIndices[$child.psbase.Name] }
# If an 'id' attribute is present, use it instead of a positional index.
if ($id = $child.GetAttribute('id')) { $Index = '@id="{0}"' -f $id }
elseif ($id = $child.GetAttribute('name')) { $Index = '@name="{0}"' -f $id }
# Recurse
Get-XmlElementPath $child $Prefix $Index
}
} else { # leaf element reached
$Prefix # output the path
}
}

- 382,024
- 64
- 607
- 775