0

What is the correct XPath syntax to match both attributes and elements?

More Info

I created the below function to find elements and attributes which contain a given value:

function Get-XPathToValue {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory)]
        [xml]$Xml
        ,
        [Parameter(Mandatory)]
        [string]$Value
    )
    process {
        $Xml.SelectNodes("//*[.='{0}']" -f ($Value -replace "'","''")) | %{
            $xpath = ''
            $elem = $_
            while (($elem -ne $null) -and ($elem.NodeType -ne 'Document')) {
                $xpath = '/' + $elem.Name + $xpath 
                $elem = $elem.SelectSingleNode('..')
            }
            $xpath
        }
    }
}

This matches elements, but not attributes.

By replacing $Xml.SelectNodes("//*[.='{0}']" with $Xml.SelectNodes("//@*[.='{0}']" I can match attributes, but not elements.

Example

[xml]$sampleXml = @"
<root>
    <child1>
        <child2 attribute1='hello'>
            <ignoreMe>what</ignoreMe>
            <child3>hello</child3>
            <ignoreMe2>world</ignoreMe2>
        </child2>
        <child2Part2 attribute2="ignored">hello</child2Part2>
    </child1>
    <notMe>
        <norMe>Not here</norMe>
    </notMe>
</root>
"@

Get-XPathToValue -Xml $sampleXml -Value 'hello'

Returns:

/root/child1/child2/child3
/root/child1/child2Part2

Should Return:

/root/child1/child2/attribute1
/root/child1/child2/child3
/root/child1/child2Part2

What have you tried?

I tried matching on:

  • //@*|*[.='{0}'] - returns matching elements, but all attributes.
  • //*|@*[.='{0}'] - returns matching attributes, but all elements.
  • //*[.='{0}']|@*[.='{0}']" - returns matching elements.
  • //@*[.='{0}']|*[.='{0}']" - returns matching attributes.
  • //(@*|*)[.='{0}']" - throws an exception.
JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
  • 2
    Your algorithm to generate the XPath string is flawed. It falls apart as soon as there is more than one element with the same name in the same level. What's the point of it anyway, if I may ask?` – Tomalak Mar 14 '17 at 14:57
  • 1
    Good point; I'll refine. This is just a utility to help me do analysis. I sometimes have to deal with large XML files, where I have to open them in a text editor, find the value that's being reported as wrong, then work out an xpath to the related element so I can write a script to compare this value with the same value in a number of other files to see if they also have issues. I don't like doing things manually if I can help it; so this just saves be a bit of effort. It's not code for any user-facing solution; just something for my utility belt. – JohnLBevan Mar 14 '17 at 15:04
  • 1
    There's another flaw in there. XPath has no string escaping. You can't replace `'` with `''` and all is well, that's not how this works. – Tomalak Mar 14 '17 at 15:11
  • 1
    And the third flaw is that it completely ignores XML namespaces. – Tomalak Mar 14 '17 at 15:22

2 Answers2

1

Using the following XPath resolved the issue: //@*[.='{0}']|//*[.='{0}']

i.e.

function Get-XPathToValue {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory)]
        [xml]$Xml
        ,
        [Parameter(Mandatory)]
        [string]$Value
    )
    process {
        $Xml.SelectNodes("//@*[.='{0}']|//*[./text()='{0}']" -f ($Value -replace "'","''")) | %{
            $xpath = ''
            $elem = $_
            while (($elem -ne $null) -and ($elem.NodeType -ne 'Document')) {
                $prefix = ''
                if($elem.NodeType -eq 'Attribute'){$prefix = '@'}
                $xpath = '/' + $prefix + $elem.Name + $xpath 
                $elem = $elem.SelectSingleNode('..')
            }
            $xpath
        }
    }
}
JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
  • 1
    Careful. That XPath will select `a` *and* `b` when testing for `"value-b"` for `value-b`. Please see [**Testing text() nodes vs string values in XPath**](http://stackoverflow.com/q/34593753/290085) – kjhughes Mar 14 '17 at 15:12
  • Thanks @kjhughes; corrected. I'd not done that when originally considering that attributes wouldn't have a text node (without considering other implications); but now they have separate conditions, that's so much the better. – JohnLBevan Mar 14 '17 at 15:16
1

Your method of deriving an XPath expression has three flaws, as indicated in the comments to your question.

  1. It does not handle the case where there are multiple elements with the same name at the same level.
  2. It does not handle quotes in values properly.
  3. It does not handle XML namespaces.

Here is my take on a function that addresses these points (I also gave it a name that I think is more appropriate within the cmdlet naming scheme):

function Convert-ValueToXpath {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory)]
        [xml]$Xml
        ,
        [Parameter(Mandatory)]
        [string]$Value
    )
    process {
        $escapedValue = "concat('', '" + ($value -split "'" -join "', ""'"", '") + "')"
        $Xml.SelectNodes("(//*|//@*)[normalize-space() = {0}]" -f $escapedValue) | % {
            $xpath = ''
            $elem = $_
            while ($true) {
                if ($elem.NodeType -eq "Attribute") {
                    $xpath = '/@' + $elem.Name
                    $elem = $elem.OwnerElement
                } elseif ($elem.ParentNode) {
                    $precedingExpr = "./preceding-sibling::*[local-name() = '$($elem.LocalName)' and namespace-uri() = '$($elem.NamespaceURI)']"
                    $pos = $elem.SelectNodes($precedingExpr).Count + 1
                    $xpath = '/' + $elem.Name + "[" + $pos + "]" + $xpath
                    $elem = $elem.ParentNode
                } else {
                    break;
                }
            }
            $xpath
        }
    }
}

For your sample input I get these XPaths:

/root[1]/child1[1]/child2[1]/@attribute1
/root[1]/child1[1]/child2[1]/child3[1]
/root[1]/child1[1]/child2Part2[1]
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Nice one, thanks @Tomalak. NB: I think the above still doesn't cover namespaces though? Could easily work around that by using `/*[local-name()='$($elem.Name)']` though. – JohnLBevan Mar 14 '17 at 16:09
  • It covers namespaces, but `[local-name() = '$($elem.LocalName)' and namespace-uri() = '$($elem.NamespaceURI)']` would be better in `$precedingExpr`. In fact, let's include this right now. – Tomalak Mar 14 '17 at 16:15
  • 1
    The issue is, it still does not generate XPaths that address all possible ways of how namespaces can be declared in XML. It's possible to do, but the resulting XPath becomes very bloated. – Tomalak Mar 14 '17 at 16:21