Parsing XML using Python and create an excel report - Elementree/lxml

Question

I am trying to parse many XML test results files and get the necessary data like testcase name, test result, failure message etc to an excel format. I decided to go with Python.

My XML file is a huge file and the format is as follows. The cases which failed has a message, & and the passed ones only has . My requirement is to create an excel with testcasename, test status(pass/fail), test failure message.

<?xml version="1.0" encoding="UTF-8"?>
<testsuites xmlns:a="http://microsoft.com/schemas/VisualStudio/TeamTest/2006"
            xmlns:b="http://microsoft.com/schemas/VisualStudio/TeamTest/2010">
  <testsuite name="MSTestSuite" tests="192" time="0" failures="16" errors="0" skipped="0">
    <testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
      <failure message="unknown error: jQuery is not defined&#xA;">  
      </failure>
      <system-out>Given the user is on the landing page
      -&gt; error: unknown error: jQuery is not defined
      </system-out>
      <system-err>unknown error: jQuery is not defined          
      </system-err>
    </testcase>
    <testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
      <failure message="unknown error: jQuery is not defined&#xA;"> 
      </failure>
      <system-out>Given the user is on the landing page
      -&gt; error: unknown error: jQuery is not defined
      </system-out>
      <system-err>unknown error: jQuery is not defined          
      </system-err>
    </testcase>                                                               
    <testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
      <failure message="unknown error: jQuery is not defined&#xA;"> 
      </failure>
      <system-out>Given the user is on the landing page
      -&gt; error: unknown error: jQuery is not defined
      </system-out>
      <system-err>unknown error: jQuery is not defined          
      </system-err>
    </testcase>                                                           
    <testcase classname="dfsgsgg" name="Results are displayed" time="27.8096966">
      <system-out>Given the user is on the landing page
      -&gt; error: unknown error: jQuery is not defined
      </system-out>
    </testcase>
  </testsuite>
</testsuites>

I have come up with the following code. Please bear if there are any basic mistakes as I am very new to this. With this code I can retrieve test case name, class name but I am unable to pick the failure message, system-out and system-err. Though these tags are also part of testcase tag, I am not able to fetch it. Can someone help me through this? Thanks! With only testcase name and class name, I am able to write to an excel.

## Parsing XML files ###

import os
import pandas as pd
from lxml import etree

df_reports = pd.DataFrame()
df = pd.DataFrame()
i = 0
pass_count = 0
fail_count = 0

path = '/TestReports_Backup/'
files = os.listdir(path)
print(len(files))

for file in files:
    file_path = path+file
    print(file_path)
    tree = etree.parse(file_path)
    testcases = tree.xpath('.//testcase')
    systemout = tree.xpath('.//testcase/system-out')
    failure = tree.xpath('.//testcase/failure')

    for testcase in testcases:
        test = {}
        test['TestCaseName'] = testcase.attrib['name']
        test['Classname'] = testcase.attrib['classname']
        test['TestStatus'] = failure.attrib['message']

        df = pd.DataFrame(test, index=[i])
        i = i + 1
        df_reports = pd.concat([df_reports, df])
        print(df_reports)

df.head()
df_reports.to_csv('/TestReports_Backup/Reports.csv')

Actually this is file is autogenerated through regression testing. — user16120973, Jun 15 '21 at 15:07
In pandas, you should never need to assign an empty `DataFrame`. [Never call `DataFrame.append` or `pd.concat` inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/a/36489724/1422451) — Parfait, Jun 15 '21 at 15:50
@Parfait I guess I have edited the XML wrong. I did not want to copy paste the whole XML as its official data. Also, my Python code is able to read the test case name and classname and writes it to an excel. Trouble now is I am unable to fetch the failure message/system-out from the XML. — user16120973, Jun 15 '21 at 16:15

Parfait · Accepted Answer · 2021-06-16T15:36:04.243

Since your XML is relatively flat, consider a list/dictionary comprehension to retrieve all child elements and attrib dictionary. From there, call pd.concat once outside the loop. Below runs a dictionary merge (Python 3.5+).

path = "/TestReports_Backup"

def proc_xml(file_path):
    tree = etree.parse(os.path.join(path, file_path))
    
    data = [
        { **n.attrib, 
          **{k:v for el in n.xpath("*") for k,v in el.attrib.items()},
          **{el.tag: el.text.strip() for el in n.xpath("*") if el.text.strip()!=''}
        } for n in tree.xpath("//testcase")
    ] v
        
    return pd.DataFrame(data)
    
df_reports = pd.concat([
    proc_xml(f) 
    for f in os.listdir(path) 
    if f.endswith(".xml")
])

output

  classname                   name        time                                 message                                         system-out                            system-err
0   dfsgsgg  Results are displayed  27.8096966  unknown error: jQuery is not defined\n  Given the user is on the landing page\n      -...  unknown error: jQuery is not defined
1   dfsgsgg  Results are displayed  27.8096966  unknown error: jQuery is not defined\n  Given the user is on the landing page\n      -...  unknown error: jQuery is not defined
2   dfsgsgg  Results are displayed  27.8096966  unknown error: jQuery is not defined\n  Given the user is on the landing page\n      -...  unknown error: jQuery is not defined
3   dfsgsgg  Results are displayed  27.8096966                                     NaN  Given the user is on the landing page\n      -...                                   NaN

Also, starting in Pandas v1.3, there is now an available read_xml (default parser being lxml and defaults to retrieve all attributes and child elements at specific xpath):

path = "/TestReports_Backup"

df_reports = pd.concat([
    pd.read_xml(os.path.join(path, f), xpath="//testcase") 
    for f in os.listdir(path) 
    if f.endswith(".xml")
])

Thank you. The first solution works good for me. As I am having Pandas version 1.2.4, second one is not working for me. — user16120973, Jun 15 '21 at 20:59
What is wrong with my code? Why am I not able to fetch 'message' attribute in failure tags? @Parfait — user16120973, Jun 16 '21 at 14:23
I adjusted solution to locate failure node which has no text but single attribute. — Parfait, Jun 16 '21 at 15:13
You will need to ask a new question to not conflate this one. Be sure to do research and make an attempt before you ask. Happy coding! — Parfait, Jun 18 '21 at 00:16

Parsing XML using Python and create an excel report - Elementree/lxml

1 Answers1

Linked