1

I need a good way to find the names of all test cases and the result for every test case in an html file. I'm new to BeautifulSoup and need some good advice.

First I have done this, using BeautifulSoup to read the data and prettify it and put the data in a file:

from bs4 import BeautifulSoup
f = open('myfile','w')
soup = BeautifulSoup(open("C:\DEV\debugkod\data.html"))
fixedSoup = soup.prettify()
fixedSoup = fixedSoup.encode('utf-8')
f.write(fixedSoup)
f.close()

When I check parts in the prettify result in the file it will for example look like this (the file includes 100s of tc's and results):

<a name="1005">
  </a>
  <div class="Sequence">
   <div class="Header">
    <table class="Title">
     <tr>
      <td>
       IAA REQPROD 55 InvPwrDownMode - Shut down communication (Sequence)
      </td>
      <td class="ResultStateIcon">
       <img src="Resources/Passed.png"/>
      </td>
     </tr>
    </table>
    <table class="DynamicAttributes">
     <colgroup>
      <col width="20">
       <col width="30">
        <col width="20">
         <col width="30">
         </col>
        </col>
       </col>
      </col>
     </colgroup>
     <tr>
      <th>
       Start time:
      </th>
      <td>
       2014/09/23 09-24-31
      </td>
      <th>
       Stop time:
      </th>
      <td>
       2014/09/23 09-27-25
      </td>
     </tr>
     <tr>
      <th>
       Execution duration:
      </th>
      <td>
       173.461 sec.
      </td>
      *<th>
       Name:
      </th>
      <td>
       IAA REQPROD 55 InvPwrDownMode - Shut down communication
      </td>*
     </tr>
     <tr>
      <th>
       Library link:
      </th>
      <td>
      </td>
      <th>
       Creation date:
      </th>
      <td>
       2013/4/11, 8-55-57
      </td>
     </tr>
     <tr>
      <th>
       Modification date:
      </th>
      <td>
       2014/9/23, 9-27-25
      </td>
      <th>
       Author:
      </th>
      <td>
       cnnntd
      </td>
     </tr>
     <tr>
      <th>
       Hierarchy:
      </th>
      <td>
       IAA.  IAA REQPROD 55 InvPwrDownMode - Shut down communication
      </td>
      <td>
      </td>
      <td>
      </td>
     </tr>
    </table>
    <table class="StaticAttributes">
     <colgroup>
      <col width="20">
       <col width="80">
       </col>
      </col>
     </colgroup>
     <tr>
      <th>
       Description:
      </th>
      <td>
      </td>
     </tr>
     <tr>
      <th>
       *Result state:
      </th>
      <td>
       Passed
      </td>*
     </tr>
    </table>
   </div>
   <div class="BlockReport">
    <a name="1007">

In this file I now want to find the info about "Name" and "Result state:". If check the prettify result I can see the tags "Name:" and "Result state:". Hopefully it possible to use them to find testCase name and test result... So the printout should look something like this:

 Name = IAA REQPROD 55 InvPwrDownMode - Shut down communication 
 Result = Passed
 etc

Does anyone know how to do this using BeautifulSoup?

BioGeek
  • 21,897
  • 23
  • 83
  • 145
martin
  • 85
  • 1
  • 11

1 Answers1

0

Using the html from your second Pastebin link, the following code:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("beautifulsoup2.html"))


names = []
for table in soup.findAll('table', attrs={'class': 'Title'}):
    td = table.find('td')
    names.append(td.text.encode("ascii", "ignore").strip())

results = []
for table in soup.findAll(attrs={'class': 'StaticAttributes'}):
    tds = table.findAll('td')
    results.append(tds[1].text.strip())

for name, result in zip(names, results):
    print "Name = {}".format(name)
    print "Result = {}".format(result)
    print

Gives this result:

Name = IEM(Project)
Result = PassedFailedUndefinedError

Name = IEM REQPROD 132765 InvPwrDownMode - Shut down communication SN1(Sequence)
Result = Passed

Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep SN2(Sequence)
Result = PassedUndefined

Name = IEM Test(Sequence)
Result = Failed

Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep(Sequence)
Result = Error

I added the encode("ascii", "ignore") because otherwise I would get UnicodeDecodeError's. See this answer for how those characters possibly ended up in your html.

Community
  • 1
  • 1
BioGeek
  • 21,897
  • 23
  • 83
  • 145
  • thx for helping! This code will result in following error: name = td.text.strip() AttributeError: 'NoneType' object has no attribute 'text' – martin Sep 23 '14 at 11:19
  • That means there is no tag like `` with the `class` attribute `Title`. So your real file is different from the sample you provided. Can you put the full file on, for example, pastebin?
    – BioGeek Sep 23 '14 at 11:24
  • check out http://pastebin.com/d6wzzxzW I used this code to extraxt the data `soup = BeautifulSoup(open("C:\DEV\debugkod\data.html")) fixedSoup = soup.prettify() fixedSoup = fixedSoup.encode('utf-8') myfile.write(fixedSoup)` – martin Sep 23 '14 at 11:43
  • The data.html file will look like this http://pastebin.com/0vpJ2RGE if I just open it in notepad++ – martin Sep 23 '14 at 11:48