1

I want to get correct output userID , itemID and related balance in the each bill and export the result

I'm getting repeated itemID/userID with below code:

each 'user' could have many 'items' and each item has a balance. userID can be repeated for each item

    userid = node.findtext('./userID')
    itemids = node.findall('./bill/item/itemID')
    bills = node.findall(".//bill/balance")
      
   for item in itemids:
      for bill in bills:
         print(userid, item.text, bill.text) 

here is example of XML

<user>
    <userID>10269</userID>
    <name>
        <displayName>SAFIYA NASSER ABDULLAH AL SIYABI</displayName>
        <firstName>SAFIYA</firstName>
        <middleName>NASSER ABDULLAH</middleName>
        <lastName>AL SIYABI</lastName>
    </name>
    <library>MAIN</library>
    <numberOfBills>3</numberOfBills>
    <bill>
        <item>
            <callNumber>BP173.4 .B57 2003</callNumber>
            <copyNumber>1</copyNumber>
            <itemID>423999</itemID>
            <library>MAIN</library>
            <dateCreated>2009-02-15</dateCreated>
            <isPermanent>true</isPermanent>
        </item>
        <amount currency="OR">1.20</amount>
        <reason>OVERDUE</reason>
        <balance currency="OR">1.20</balance>
        <library>MAIN</library>
    </bill>
    <bill>
        <item>
            <callNumber>BP173.3 .G423 2004</callNumber>
            <copyNumber>2</copyNumber>
            <itemID>429053</itemID>
            <library>MAIN</library>
            <dateCreated>2009-02-15</dateCreated>
            <isPermanent>true</isPermanent>
        </item>
        <amount currency="OR">1.20</amount>
        <reason>OVERDUE</reason>
        <balance currency="OR">1.20</balance>
        <library>MAIN</library>
    </bill>
    <bill>
        <item>
            <callNumber>BP173.3 .N34 2003</callNumber>
            <copyNumber>1</copyNumber>
            <itemID>423991</itemID>
            <library>MAIN</library>
           <dateCreated>2009-02-15</dateCreated>
            <isPermanent>true</isPermanent>
        </item>
        <amount currency="OR">24.00</amount>
        <reason>OVERDUE</reason>
        <balance currency="OR">24.00</balance>
        <library>MAIN</library>
    </bill>
</user>

Thank in advanced

2 Answers2

2

You're iterating over every <item>, and for each item, you then iterate over every <bill> from the beginning. You're basically using the length of node.findall('.//itemID') as the number of times to iterate over all the bill tags, which isn't want you want.

Iterate over every bill, then in a nested for loop, iterate over the items found under that specific bill, instead of every item in the document.

for bill in node.findall('bill'):
    balance = bill.find('balance')
    for item in bill.findall('item'):
        itemID = item.find('itemID')
Jacob Walls
  • 873
  • 3
  • 15
0

Consider a list/dict comprehension to extract selected XML data:

import xml.etree.ElementTree as et

doc = et.parse("Input.xml")

user_bill_list_of_dict = [{'userID': doc.findtext('userID'),
                           'itemID': b.find('item').findtext('itemID'),
                           'balance': b.findtext('balance')
                          } for b in doc.findall('bill')]
         
print(user_bill_list_of_dict)
# [{'userID': '10269', 'itemID': '423999', 'balance': '1.20'}, 
#  {'userID': '10269', 'itemID': '429053', 'balance': '1.20'}, 
#  {'userID': '10269', 'itemID': '423991', 'balance': '24.00'}]

You can even extend for all XML data using dictionary merging (available Python 3.5+):

data = [{**{'userID': doc.findtext('userID')},
         **{n.tag:n.text for n in doc.findall('./name/*')},
         **{i.tag:i.text for i in bill.findall('item/*')},
         **{b.tag:b.text for b in bill.findall('*') if b.tag != 'item'},
        } for bill in doc.findall('bill')]

print(data)
# [{'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI', 
#   'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI', 
#   'callNumber': 'BP173.4 .B57 2003', 'copyNumber': '1', 'itemID': '423999', 
#   'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '1.20',
#   'reason': 'OVERDUE', 'balance': '1.20'}, 
#  {'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI', 
#   'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI', 
#   'callNumber': 'BP173.3 .G423 2004', 'copyNumber': '2', 'itemID': '429053', 
#   'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '1.20',
#    'reason': 'OVERDUE', 'balance': '1.20'}, 
# {'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI', 
#  'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI',
#  'callNumber': 'BP173.3 .N34 2003', 'copyNumber': '1', 'itemID': '423991', 
#  'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '24.00',
#  'reason': 'OVERDUE', 'balance': '24.00'}]

Even more, above data can be migrated to Pandas data frame:

import pandas as pd
...

df = pd.DataFrame(data)

#   userID                       displayName firstName       middleName   lastName          callNumber copyNumber  itemID library dateCreated isPermanent amount   reason balance
# 0  10269  SAFIYA NASSER ABDULLAH AL SIYABI    SAFIYA  NASSER ABDULLAH  AL SIYABI   BP173.4 .B57 2003          1  423999    MAIN  2009-02-15        true   1.20  OVERDUE    1.20
# 1  10269  SAFIYA NASSER ABDULLAH AL SIYABI    SAFIYA  NASSER ABDULLAH  AL SIYABI  BP173.3 .G423 2004          2  429053    MAIN  2009-02-15        true   1.20  OVERDUE    1.20
# 2  10269  SAFIYA NASSER ABDULLAH AL SIYABI    SAFIYA  NASSER ABDULLAH  AL SIYABI   BP173.3 .N34 2003          1  423991    MAIN  2009-02-15        true  24.00  OVERDUE   24.00
Parfait
  • 104,375
  • 17
  • 94
  • 125