0

I have this XML as string returned by a DB query as clob and converted using OutputTypeHandler method which retuns the contet of the clob element in a tuple :

This is the code that returns the tuple from clob content:

def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):
if defaultType == cx_Oracle.CLOB:
    return  cursor.var(cx_Oracle.LONG_STRING,arraysize=cursor.arraysize)

This is the code where the XML tree is build after the tuple returned by OutputTypeHandler is converted to string :

import xml.etree.ElementTree as ET

conn.outputtypehandler = OutputTypeHandler
c = conn.cursor()
c.execute("""select Clob from Table""") 

clobData = c.fetchone()
str =  ''.join(clobData) #saving the new string value as str
root = ET.fromstring(str) #building the xml Tree using xml.etree.ElementTree as ET
ET.dump(root)

Resulting XML message is (replica of the XML in the DB) :

<Parent>
<Batch_Number>2000</Batch_Number>
<Total_No_Of_Batches>12312</Total_No_Of_Batches>
<requestNo>1923</requestNo>
<Parent1>
    <Parent2>
        <Parent3>
                <lastModifiedDateTime>2022-11-11T11:07:30.000</lastModifiedDateTime>
                <purpose>NeverMore</purpose>
                <endDate>9999-12-31T00:00:00.000</endDate>
                <createdDateTime>2019-06-06T06:32:16.000</createdDateTime>
                <createdOn>2019-06-06T08:32:16.000</createdOn>
                <address2>Forever street 21</address2>
                <externalCode>home</externalCode>
                <lastModifiedBy>user2.thisUser</lastModifiedBy>
                <lastModifiedOn>2039-06-11T13:07:30.000</lastModifiedOn>
                <lastModifiedBy>MG</lastModifiedBy>
                <PS>1234431</PS>
        </Parent3>
    </Parent2>
</Parent1>

Here is where I'm trying to look into every value of every child/grandchild of the XML untill I find a specific value :

for child in root: if(child.text == 'MG'): print(child.text) else: print("Value not found")

The result is really strange, and I don't understand where its comming from :

<Parent>
<Batch_Number>2000</Batch_Number>
<Total_No_Of_Batches>12312</Total_No_Of_Batches>
<requestNo>1923</requestNo>
<Parent1>
    <Parent2>
        <Parent3>
                <lastModifiedDateTime>2022-11-11T11:07:30.000</lastModifiedDateTime>
                <purpose>NeverMore</purpose>
                <endDate>9999-12-31T00:00:00.000</endDate>
                <createdDateTime>2019-06-06T06:32:16.000</createdDateTime>
                <createdOn>2019-06-06T08:32:16.000</createdOn>
                <address2>Forever street 21</address2>
                <externalCode>home</externalCode>
                <lastModifiedBy>user2.thisUser</lastModifiedBy>
                <lastModifiedOn>2039-06-11T13:07:30.000</lastModifiedOn>
                <lastModifiedBy>MG</lastModifiedBy>
                <PS>1234431</PS>
        </Parent3>
    </Parent2>
</Parent1>

Value not found Value not found Value not found Value not found

If I only print every child find from root :

  for child in root:
       print(child)

The result is :

*Whole XML*
<Element 'Batch_Number' at 0x05203E10>
<Element 'Total_No_Of_Batches' at 0x05203E70>
<Element 'requestNo' at 0x05203EA0>
<Element 'Parent1' at 0x05203ED0>

I did try another aproach :

    element = root.find('MG')

if not element:  
    print "element not found, or element has no subelements"

if element is None:
    print "element not found"

The result was the same, full xml printed and no element found :

*WholeXML*
element not found, or element has no subelements
element not found

I'm not sure what I'm doing wrong, I assume that the XML tree that is built based on the string is fauly and somehow it's not being parsed tag to tag.

MGA
  • 195
  • 1
  • 1
  • 8
  • You need to create a recursive function that gets a node and a string (the value to search for). Then the function should iterate over the children and check if their value is the same. If not, then if the child has children, ot should call itself with the child as the node. – בנימין כהן Apr 13 '20 at 08:17
  • Thanks for ur answer, indeed, I didn't think of it like that, cheers! – MGA Apr 13 '20 at 08:39

1 Answers1

0

lastModifiedBy is embedded in Parent3, which is itself embedded in Parent2 and Parent1 - that's why you won't find a text matching MG in your approach.

If you'd like to follow on this approach, you need to define method, which recursively checks every child, if given element has children.

Please refer to: ElementTree - findall to recursively select all child elements

Artur Kasza
  • 356
  • 1
  • 9
  • Thanks for ur help mate! I opted for thefourtheye generator recursive method, it returns `` , is there a way to return the actual name of the tag (child) for the specific value? Or even the full path, any would do – MGA Apr 13 '20 at 08:35
  • I'm not really proficient in using XML with Python, but the first approach I can think about is following the path by appending it in your recursive function. Try to dig into XMLElementTree docs. – Artur Kasza Apr 13 '20 at 08:40
  • After further analyzing the solution presented in the link that you provided, I came to the conclusion that it's not what I'm looking for, the recursive function only searches in the XMl for a specific tag name, I need to search the content of the tags for a specific value... It is still very helpful, dont get me wrong, but i need to figure it out first then i can mark the thread as solved – MGA Apr 13 '20 at 09:21