2

Here is a part of a jenkins xml file.

I want to extract the defaultValue of project_name with xpath.

I this case the value is *****.

<?xml version='1.0' encoding='UTF-8'?>
<project>
    <properties>
        <hudson.model.ParametersDefinitionProperty>
            <parameterDefinitions>
                <hudson.model.StringParameterDefinition>
                    <name>customer_name</name>
                    <description></description>
                    <defaultValue>my_customer</defaultValue>
                </hudson.model.StringParameterDefinition>
                <hudson.model.StringParameterDefinition>
                    <name>project_name</name>
                    <description></description>
                    <defaultValue>*****</defaultValue>
                </hudson.model.StringParameterDefinition>
            </parameterDefinitions>
        </hudson.model.ParametersDefinitionProperty>
    </properties>
 </project>

I use etree of python, but AFAIK this does not matter much since this is a xpath question.

My current xpath knowledge is limited. My current approach:

for name_tag in config.findall('.//name'):
    if name_tag.text=='project_host':
        default=name_tag.getparent().findall('defaultValue')[0].text

Here I get AttributeError: 'Element' object has no attribute 'getparent'

I thought about this again, and I think that looping in python is the wrong approach. This should be selectable via xpath.

guettli
  • 25,042
  • 81
  • 346
  • 663
  • 1
    Please also show your Python code and explain why your approach does not work as expected. Thanks. More help: http://stackoverflow.com/help/mcve. – Mathias Müller Feb 03 '16 at 10:35
  • etree only supports a limited subset of XPath 1.0 : [19.7.2.2. Supported XPath syntax](https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax). Use [`lxml`](http://lxml.de/xpathxslt.html) if you want to use XPath extensively – har07 Feb 03 '16 at 10:46
  • @MathiasMüller I added my current solution. – guettli Feb 03 '16 at 10:47
  • @har07 I can install lxml, no problem. But the xpath magic itself is unknown to me up to now. – guettli Feb 03 '16 at 10:48
  • In XPath you can use predicate expression (expression wrapped in `[]`) to filter context element i.e the element right before `[]`, with specific criteria, just like what the answers below demonstrate. – har07 Feb 03 '16 at 11:15

2 Answers2

2

The XPath answer to your question is

/project/properties/hudson.model.ParametersDefinitionProperty/parameterDefinitions/hudson.model.StringParameterDefinition[name = 'project_name']/defaultValue/text()

which will select as the only result

*****

Given that your actual document does not have a namespace. You do not need to access the parent element nor a sibling axis.

Even etree should support this kind of XPath expressions, but it might not - see the comment by har07.


I thought about this again, and I think that looping in python is the wrong approach. This should be selectable via xpath.

Yes, I agree. If you want to select a single value from a document, select it with an XPath expression and store it as a Python string directly, without looping through elements.


Full example with lxml

from lxml import etree
from StringIO import StringIO

document_string = """<project>
    <properties>
        <hudson.model.ParametersDefinitionProperty>
            <parameterDefinitions>
                <hudson.model.StringParameterDefinition>
                    <name>customer_name</name>
                    <description></description>
                    <defaultValue>my_customer</defaultValue>
                </hudson.model.StringParameterDefinition>
                <hudson.model.StringParameterDefinition>
                    <name>project_name</name>
                    <description></description>
                    <defaultValue>*****</defaultValue>
                </hudson.model.StringParameterDefinition>
            </parameterDefinitions>
        </hudson.model.ParametersDefinitionProperty>
    </properties>
 </project>"""

tree = etree.parse(StringIO(document_string))

result_list = tree.xpath("/project/properties/hudson.model.ParametersDefinitionProperty/parameterDefinitions/hudson.model.StringParameterDefinition[name = 'project_name']/defaultValue/text()")

print result_list[0]

Output:

*****
Community
  • 1
  • 1
Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
  • @ Mathias Muller If your xpath returns python list then how will you construct text from them?Could you show - i need to correct me in case of web testing using selenium etc? – Learner Feb 03 '16 at 11:17
  • @SIslam The return value of XPath expressions, in this case, depends on the Python function that you are invoking. For instance, an lxml function might return a list. But it is inaccurate to say that my XPath expression returns a Python list. If the result is indeed a list, then extracting the strings from a list of strings is trivial in Python. Finally, what does Selenium testing have to do with this question? – Mathias Müller Feb 03 '16 at 11:24
  • Ah! I practice selenium(python) thus i need correct path to follow- I meant lxml returns python literal (i.e. list) BTW I see you used `text()` in your xpath and said not to use loop so now could you please show how to get text from the lxml object that returned by your xpath for multiple nodes and finally i am eager to see an working example. – Learner Feb 03 '16 at 11:33
  • I just saw your edit- could you show how to handle the situation without looping if there were more that one same node (can be extracted with same xpath ) that your xpath selected e.g. multiple `defaultValue` after multiple `project_name` – Learner Feb 03 '16 at 11:36
  • @SIslam As far as I can see, you are not the person who has asked the question above. If you have a new question, then please ask a question in its own right (i.e. open a new post). If there are multiple results, the list is just extended with more values. (I might not really have understood what you are asking, though.) – Mathias Müller Feb 03 '16 at 11:38
  • @Matias I just wanted to understand your quote `I think that looping in python is the wrong approach.` but failed for my limited KEN and so asked for general working example. Forget about me, is not it better to be solved whereas future reader of this post get the right way? – Learner Feb 03 '16 at 11:43
  • @SIslam This is not my quote, it is a quote of what the OP themselves said. Anyway, _I don't understand what you are asking_. I have already answered to the best of my knowledge what I **think** you are asking, but it seems I did not understand. So, please, stop writing lengthy and unintelligible comments. – Mathias Müller Feb 03 '16 at 11:47
1

You can try lxml.etree as below- I used looping to select all nodes that have same position.

Examples of needed xpaths are - I used relative xpath since it is very useful incase of long node path.

.//hudson.model.StringParameterDefinition/name[contains(text(),'project_name')]/following-sibling::defaultValue

OR

.//hudson.model.StringParameterDefinition/name[contains(text(),'project_name')]/following::defaultValue[1]

from lxml import etree as et

data  = """<?xml version='1.0' encoding='UTF-8'?>
<project>
    <properties>
        <hudson.model.ParametersDefinitionProperty>
            <parameterDefinitions>
                <hudson.model.StringParameterDefinition>
                    <name>customer_name</name>
                    <description></description>
                    <defaultValue>my_customer</defaultValue>
                </hudson.model.StringParameterDefinition>
                <hudson.model.StringParameterDefinition>
                    <name>project_name</name>
                    <description></description>
                    <defaultValue>*****</defaultValue>
                </hudson.model.StringParameterDefinition>
            </parameterDefinitions>
        </hudson.model.ParametersDefinitionProperty>
    </properties>
 </project>"""

tree = et.fromstring(data)

print [i.text for i in tree.xpath(".//hudson.model.StringParameterDefinition/defaultValue")]
print [i.text for i in tree.xpath(".//hudson.model.StringParameterDefinition/name[contains(text(),'project_name')]/following-sibling::defaultValue")]
print [i.text for i in tree.xpath(".//hudson.model.StringParameterDefinition/name[contains(text(),'project_name')]/following::defaultValue[1]")]

Output-

['my_customer', '*****']
['*****']
['*****']
Community
  • 1
  • 1
Learner
  • 5,192
  • 1
  • 24
  • 36