-1

I got a question how to extract some text using python regex. I would like to do what I want using regex only not using the module for HTML such like a bs4.

it's example text as follow .

tr_range =

<tr>
    <td class="table-basic-l">
        Resolution
    </td>
    <td class="table-basic-l">
        Horizontal Frequency (kHz)
    </td>
    <td class="table-basic-l">
        Vertical Frequency (Hz)
    </td>
</tr>

I'd like to extract all texts under td elements like as Resolution, Horizontal Frequency (kHz), Vertical Frequency (Hz) using regex only.

I am trying to exclude start of all td elements but it's not that so easy for me so far.

Tân
  • 1
  • 15
  • 56
  • 102
Cho
  • 143
  • 2
  • 8

1 Answers1

2

You can get the text with removing the html tags with regex like this (works only for tables (tr and td tags)):

import re

html='<tr>'\
    '<td class="table-basic-l">'\
    '    Resolution'\
    '</td>'\
    '<td class="table-basic-l">'\
    '    Horizontal Frequency (kHz)'\
    '</td>'\
    '<td class="table-basic-l">'\
    '    Vertical Frequency (Hz)'\
    '</td>'\
'</tr>'

print(re.sub("<[/]*t.*?>", "", html))
zypro
  • 1,158
  • 3
  • 12
  • 33