-1

My xml looks like the following :

<example>
<Test_example>Author%5773637864827/Testing-75873874hdueu47.jpg</Test_example>
<Test_example>Auth0r%5773637864827/Testing245-75873874hdu6543u47.ts</Test_example>

This XML has 100 lines and i am interested in the tag "<Test_example>". In this tag I want to remove everything until it sees a / and when it sees a - remove everything until it sees the full stop.

End result should be

<Test_example>Testing.jpg</Test_example>
<Test_example>Testing245.ts</Test_example>

I am a beginner and would love some help on this. I think maybe regex is the best method?

  • Need to mention i am only interested in locating the tag wherever it is mentioned in the XML – newbie_786 Mar 26 '20 at 14:38
  • Does this answer your question? [How do I parse XML in Python?](https://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python) – Uyghur Lives Matter Mar 26 '20 at 15:40
  • the python attempted is not good enough so i haven't added it. the .jpg and .ts needs to be retained . I ma just looking at how i can find a specific tag and remove everything till it sees a '/' and removes everything which starts with a ' -' till it sees a full stop. – newbie_786 Mar 26 '20 at 16:11

1 Answers1

1

Consider XSLT, the special-purpose language designed to to transform XML files, using its substring-before and substring-after functions. Python's third-party module, lxml, can run XSLT 1.0 scripts. And because XSLT is portable, it can be run in other languages or executables beyond Python:

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Test_example">
    <xsl:copy>
      <xsl:value-of select="concat(substring-before(substring-after(., '/'), '-'), 
                                   '.',
                                   substring-after(., '.'))"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')

transformer = et.XSLT(xsl)
new_xml = transformer(xml)

# PRINT TO CONSOLE
print(new_xml)

# SAVE TO FILE
with open('Output.xml', 'wb') as f:
   f.write(new_xml)

Output

<?xml version="1.0" encoding="UTF-8"?>
<example>
   <Test_example>Testing.jpg</Test_example>
   <Test_example>Testing245.ts</Test_example>
</example>

Python Demo

XSLT Demo

Parfait
  • 104,375
  • 17
  • 94
  • 125