How to extract text using regex in Python?

Question

I got a question how to extract some text using python regex. I would like to do what I want using regex only not using the module for HTML such like a bs4.

it's example text as follow .

tr_range =

<tr>
    <td class="table-basic-l">
        Resolution
    </td>
    <td class="table-basic-l">
        Horizontal Frequency (kHz)
    </td>
    <td class="table-basic-l">
        Vertical Frequency (Hz)
    </td>
</tr>

I'd like to extract all texts under td elements like as Resolution, Horizontal Frequency (kHz), Vertical Frequency (Hz) using regex only.

I am trying to exclude start of all td elements but it's not that so easy for me so far.

yeah i am able to solve this using HTML parser, but i want to know if it is possible to use regex — Cho, Nov 06 '18 at 07:06
Seriously, just use Beautiful Soup. Don't waste your time or others'. https://www.crummy.com/software/BeautifulSoup/ — Alex Reynolds, Nov 06 '18 at 07:20

zypro · Accepted Answer · 2018-11-06T07:43:08.427

2

You can get the text with removing the html tags with regex like this (works only for tables (tr and td tags)):

import re

html='<tr>'\
    '<td class="table-basic-l">'\
    '    Resolution'\
    '</td>'\
    '<td class="table-basic-l">'\
    '    Horizontal Frequency (kHz)'\
    '</td>'\
    '<td class="table-basic-l">'\
    '    Vertical Frequency (Hz)'\
    '</td>'\
'</tr>'

print(re.sub("<[/]*t.*?>", "", html))

edited Nov 06 '18 at 07:43

answered Nov 06 '18 at 07:09

zypro

1,158
3
12
33

you almost approched, but what if there is < or > in the text ? – Cho Nov 06 '18 at 07:19
Then use Beautiful Soup. – Alex Reynolds Nov 06 '18 at 07:19
@Cho shouldn't be a problem see here: regexr.com/42i67 – zypro Nov 06 '18 at 07:22
@Cho if you have only and elements, then my edit might help you? – zypro Nov 06 '18 at 07:33
yes of course it is! actually i was shocked seeing your code. – Cho Nov 06 '18 at 07:35
@Cho why shocked? :-D – zypro Nov 06 '18 at 07:43
cause it is way better than i thought – Cho Nov 06 '18 at 22:56

How to extract text using regex in Python?

1 Answers1