I am having trouble parsing HTML in Python. I'm looking for a solution of how to use Regex specifically for this solution, I'm not looking for why I shouldn't do this with Regex. There might be other solutions that could solve this better, however my requirement unfortunately cannot use other modules or libraries, thanks for the help
I have the following HTML:
<tbody ID='archive'>
<tr><td valign="top">Type / Path</td>
<td colspan=2>CIFS / 10.5.0.5:/selva</td>
</tr>
<tr><td valign="top">Last availability</td>
<td colspan=2>1970-01-01 05:30:00</td>
</tr>
<tr><td valign="top">Capacity Internal / Archive</td>
<td colspan=2>3.7 / 10.0 GByte</td>
</tr>
<tr><td valign="top">Blocks To sync / Transferred / Lost</td>
<td colspan=2>951 / 0 / 15 (last 24 hours)</td>
</tr>
<tr><td valign="top">Bandwidth Available / Total usage</td>
<td colspan=2>0 kB/s / 0 kB/s</td>
</tr>
<tr><td valign="top">Buffer Usage / Capacity left</td>
<td colspan=2>100 % / 0 m</td>
</tr>
</tbody>
<tr bgcolor="#CCCCCC"><th onclick="showhide(this,'events')" align=left colspan=3 width="style: auto;">▽ Event and Action Setup</th></tr>
<tbody ID='events'>
<tr>
<td>Arming</td>
<td>Enabled</td>
</tr>
<tr>
<td>Events</td>
<td colspan=2>PI MI AS UC TimeSync </td>
</tr>
<tr>
<td>Actions</td>
<td colspan=2>(IP) REC FR</td>
</tr>
</tbody>
I need to get the number which comes after the Buffer Usage
element (line 17 in the code above); in this case it is 100%
(line 18 in the code above), and this number can have 1 to 3 digits.
How do I get this number extracted from the code above in Python?
The reason I need to do this is so I can send out an email if the buffer is above 10%. I can code that part, but I don't know how to extract the information from the HTML above.
The code will be run on a NAS box, where it would ideal if the solution used only Python standard libraries.