1

I'm really at a loss on this. I'm trying to search through the text fields of data tag 'MF22' from an xml file using xpath and contains. It works fine when I include the search string directly in the contains function. But when I try to pass it as an argument it returns everything in the file.

from lxml import etree as ElementTree
ET = ElementTree.parse('USFLMEO_USSHARE_60200.txt')

bcnIDstr = "AB"
test1 = ET.xpath("//MF22[text()[contains(.,bcnIDstr)]]")
print 'found ' + str(len(test1)) + ' packets'

test2 = ET.xpath("//MF22[text()[contains(.,'AB')]]")
print 'found ' + str(len(test2)) + ' packets'

for elem in test1:
    packet = elem.getparent()
    for elem2 in packet:
        print elem2.tag, elem2.text 

So in the above code, 'test2' finds all of the proper elements but 'test1' doesn't. XML data below.

<?xml version="1.0" ?>
<topMessage>
    <header dest="366Z" orig="USFLMEO" number="60200" date="2015-10-02T00:00:59.000000000Z" />
    <message>
        <packetsMessage>
            <packet>
                <MF6>324</MF6>
                <MF11>3669</MF11>
                <MF71>2</MF71>
                <MF22>9C634E2AB509240</MF22>
                <MF77>FFFE2FCE31A7155A849207E5B34027500004</MF77>
                <MF67>15 275 0000 40.147870</MF67>
                <MF68>406033830.154</MF68>
                <MF69>0.000000</MF69>
                <MF70>99999.999</MF70>
                <MF72>45.1169</MF72>
                <MF73>399.987</MF73>
                <MF74>0000</MF74>
            </packet>
            <packet>
                <MF6>318</MF6>
                <MF11>3669</MF11>
                <MF71>1</MF71>
                <MF22>9C634E2AB509240</MF22>
                <MF77>FFFE2FCE31A7155A849207E5B34027500004</MF77>
                <MF67>15 275 0000 40.147850</MF67>
                <MF68>406033830.673</MF68>
                <MF69>0.000000</MF69>
                <MF70>99999.999</MF70>
                <MF72>40.0184</MF72>
                <MF73>400.066</MF73>
                <MF74>0000</MF74>
            </packet>
            <packet>
                <MF6>324</MF6>
                <MF11>3669</MF11>
                <MF71>2</MF71>
                <MF22>9C02BE29630F0A0</MF22>
                <MF77>FFFE2FCE015F14B18785039DABCE5A4EC14F</MF77>
                <MF67>15 275 0000 42.922460</MF67>
                <MF68>406033518.783</MF68>
                <MF69>0.000000</MF69>
                <MF70>99999.999</MF70>
                <MF72>41.5108</MF72>
                <MF73>400.053</MF73>
                <MF74>0000</MF74>
            </packet>
            <packet>
                <MF6>315</MF6>
                <MF11>3669</MF11>
                <MF71>3</MF71>
                <MF22>9C02BE29630F0A0</MF22>
                <MF77>FFFE2FCE015F14B18785039DABCE5A4EC14F</MF77>
                <MF67>15 275 0000 42.924905</MF67>
                <MF68>406038122.646</MF68>
                <MF69>0.000000</MF69>
                <MF70>99999.999</MF70>
                <MF72>41.0458</MF72>
                <MF73>399.815</MF73>
                <MF74>0000</MF74>
            </packet>
        </packetsMessage>
    </message>
</topMessage>

Thanks in advance!

Jesse Reich
  • 87
  • 1
  • 1
  • 11
  • Also, I'm still very new at xml and python so please feel free to tear me a new one on anything I could be doing a better way. – Jesse Reich Dec 11 '15 at 03:01
  • I just found it. I swear I searched for an hour on this; should've searched for an hour and 5 minutes! Answered [here](http://stackoverflow.com/questions/26297410/how-to-cast-a-variable-in-xpath-python) correct code is `test1 = ET.xpath('//MF22[text()[contains(.,"%s")]]' % bcnIDstr)` – Jesse Reich Dec 11 '15 at 03:06

2 Answers2

3

Simply, you did not concatenate the bcnIDstr variable in XPath string

test1 = ET.xpath("//MF22[text()[contains(.,'"+bcnIDstr+"')]]")

Even more, your XPath can be shortened:

test1 = ET.xpath("//MF22[contains(.,'"+bcnIDstr+"')]")

Alternatively, you can string format:

test1 = ET.xpath("//MF22[text()[contains(.,'{0}')]]".format(bcnIDstr))
test1 = ET.xpath("//MF22[contains(.,'{0}')]".format(bcnIDstr))
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks! You not only helped me solve this problem, you also helped with another issue that came up... creating and passing the 'MF22' as a variable to the XPath expression. This was driving me crazy and I can't quite explain it but apparently the { } doesn't need the ' ' when it is not in the contains method - the following code worked `packets = ET.xpath("//{}[contains(.,'{}')]".format(searchtag,bcnIDstr))` – Jesse Reich Dec 11 '15 at 20:10
  • This did not work for me! Instead I used `'%s` and then `% variable_name`and that worked. But not with this! I have also seem using the `$`might also do the job. – M.K Jul 11 '19 at 08:31
  • Please ask a separate question. The modulo operator, `%`, for string formatting is [discouraged in Python](https://stackoverflow.com/a/54277607/1422451) for `str.format`. We need context to your situation. – Parfait Jul 11 '19 at 13:10
0

There is an lxml specific feature to reference python variable from XPath :

bcnIDstr = "AB"
test1 = ET.xpath("//MF22[text()[contains(.,$foo)]]", foo=bcnIDstr)

documentation : http://lxml.de/xpathxslt.html#the-xpath-method

This can be useful especially when you want to reference python variable containing other than string i.e XML element, in which case you can't use string operation with the python variable. This is one example of such case scenario : No nested nodes. How to get one piece of information and then to get additional info respectively?

Community
  • 1
  • 1
har07
  • 88,338
  • 12
  • 84
  • 137