-1

I have this piece of xml string.

<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI" xmlns:libraries="http://www.ibm.com/websphere/appserver/schemas/5.0/libraries.xmi">
  <libraries:Library xmi:id="Library_1382473016602" name="sfi_lib" isolatedClassLoader="false">
    <classPath>${HOME_SFI_LIB}/sfi_com_sqw_java.jar</classPath>
  </libraries:Library>
  <libraries:Library xmi:id="Library_1528914932212" name="sfi_lib_server" isolatedClassLoader="false">
    <classPath>${HOME_SFI_LIB}/jasper/jasperreports-5.6.0.jar</classPath>
    <classPath>${HOME_SFI_LIB}/jasper/jasperreports-fonts-3.7.4.jar</classPath>
    <classPath>${HOME_SFI_LIB}/commons/commons-beanutils-1.8.2.jar</classPath>
    <classPath>${HOME_SFI_LIB}/commons/commons-collections-3.2.1.jar</classPath>
    <classPath>${HOME_SFI_LIB}/commons/commons-digester-2.1.jar</classPath>
    <classPath>${HOME_SFI_LIB}/commons/commons-discovery-0.2.jar</classPath>
    <classPath>${HOME_SFI_LIB}/commons/commons-logging-1.1.1.jar</classPath>
    <classPath>${HOME_SFI_LIB}/commons/xml-apis.jar</classPath>
    <classPath>${HOME_SFI_LIB}/commons/iText-2.1.7.jar</classPath>
    <classPath>${HOME_SFI_LIB}/jasper/barbecue-1.5-beta1.jar</classPath>
    <classPath>${HOME_SFI_LIB}/bouncycastle/bcprov-jdk15-1.45.jar</classPath>
    <classPath>${HOME_SFI_LIB}/bouncycastle/bcmail-jdk15-1.45.jar</classPath>
    <classPath>${HOME_SFI_LIB}/bouncycastle/bctsp-jdk14-1.45.jar</classPath>
    <classPath>${HOME_SFI}/sfi_arquivos/templates</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_framework_java.jar</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_adm_ama_java.jar</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_adm_gce_java.jar</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_adm_gdl_java.jar</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_adm_prt_java.jar</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_com_acg_java.jar</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_com_sca_java.jar</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_com_tge_java.jar</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_com_utl_java.jar</classPath>
    <classPath>${HOME_SFI_LIB}/sfi_ext_sge_java.jar</classPath>
  </libraries:Library>
</xmi:XMI>

What I'm trying to do is get the values of the elements that starts with ${HOME_SFI_LIB}/sfi_. I'm using re python's module to do the work. My current code is filtering only by tag classPath, but is not enough. The regular expression I'm currently using:

re.findall('<classPath>(.*?)</classPath>', xml)

Can someone help me to improve my RE in order to filter the elements that starts with ${HOME_SFI_LIB}/sfi_, like the node <classPath>${HOME_SFI_LIB}/sfi_adm_gce_java.jar</classPath>?

flpn
  • 1,868
  • 2
  • 19
  • 31
  • XML and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable. – Toto Mar 21 '19 at 14:26
  • @Toto while I agree the linked post directly *addresses* the question, I wouldn't go so far as to say it *answers* it. There are probably better duplicates for the python-specific use case – C.Nivs Mar 21 '19 at 15:23

1 Answers1

1

As this post famously points out, it is better to use an xml parser such as lxml to browse languages such as xml, html, and xhtml:

from lxml import etree

with open('your_file.xml') as fh:
    tree = etree.parse(fh)

# Now you have an elementTree instance that you can search tags with
# we can use a selector here to return a list
class_paths = tree.xpath('//classPath')

for c in class_paths:
    if '${HOME_SFI_LIB}/sfi_' in c.text:
        # rest of your code

While you could argue that for a simple xml doc, a regex approach can work, in general, trees make this process much easier to extend to larger and more complicated documents

Edit

If you can't pip install lxml, the xml package is built in and functions in a fairly similar fashion

from xml.etree import ElementTree as ET

with open('your_file.xml') as fh:
    tree = ET.parse(fh)

for element in tree.iterfind('.//classPath'):
    if '${HOME_SFI_LIB}/sfi_' in element.text:
        # rest of your code

C.Nivs
  • 12,353
  • 2
  • 19
  • 44
  • The problem is: I can't install any lib in the server I'm working. That's why I'm trying to do using pure python – flpn Mar 21 '19 at 14:19
  • 1
    Then the `xml.etree` package should do nicely, it's a builtin. i'll edit my answer – C.Nivs Mar 21 '19 at 14:21