For the following piece of HTML code, I used beautifulsoup to capture the table information:
<table>
<tr>
<td><b>Code</b></td>
<td><b>Display</b></td>
</tr>
<tr>
<td>min</td>
<td>Minute</td><td/>
</tr>
<tr>
<td>happy </td>
<td>Hour</td><td/>
</tr>
<tr>
<td>daily </td>
<td>Day</td><td/>
</tr>
This is my code:
comments = [td.get_text() for td in table.findAll("td")]
Comments=[data.encode('utf-8') for data in comments]
As you see, this table has two headers: "code and display" and some values in rows. The expected output of my code should be [code, display, min, minutes, happy, Hour, daily, day]
but this is the output:
['Code', 'Display', 'min', 'Minute', '', 'happy ',
'Hour', '', 'daily ', 'Day', '']
The output has '' in 5th, 8th, and 11th indices in comments that are not defined in this table. I think it may because of </td><td/>
.
How can I change the code to not capture u'' in the output?