Unable to accurately search a particular text in a html tag using Python

Question

I have the below regex to identify text in a html tag that doesn't yields the result expected.

HTML Tag:

<td>Issue Amount</td>
<td>:</td>
<td>20,000,000.00</td>

Find = re.findall(?<=Issue Amount</td> <td>:</td> <td>) [0-9,]),soup_string)[0]

I need to get the numerical value 20,000,000.00 from this tag.

Any advise what am I doing wrong here. I did try couple of other ways but with no success.

Sounds like it has an answer [here](https://stackoverflow.com/questions/9833152/python-regular-expressions-extract-every-table-cell-content) or [here](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — vahdet, Mar 01 '19 at 14:43
Try this regex `([\d,.]+)` [\d,.]+ : one or more times any digits or comma or point () : capturing group — nissim abehcera, Mar 01 '19 at 15:20
Thanks nissim.. It works, however this is just a part of html body and there are chances it might end up matching other values as well. Of the complete body the regex should match only this part. — Shashi Shankar Singh, Mar 01 '19 at 15:31

score 2 · Answer 1 · answered Mar 01 '19 at 14:43

Do not under any circumstances try to parse XML with a regex unless you wish to invoke rite ₆6⁶ Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.

Use an HTML parsing library see this page for some ways to do it.

However in your case you have mucked up your regex by looking for a space between your </td> and <td> tags. Whereas your data has carriage returns. You can use the \s meta-character to look for any white space character

score 0 · Accepted Answer · answered Mar 04 '19 at 08:07

0

Below is the regex piece that helped me get the desired output. Thanks all for your inputs.

(?<=Issue Amount[td\W]{21})([\d,.]+)

answered Mar 04 '19 at 08:07

Shashi Shankar Singh

185
3
19

You STILL should use a proper HTML parser to parse HTML. – bruno desthuilliers Mar 04 '19 at 08:11
Sure @brunodesthuilliers... :) – Shashi Shankar Singh Mar 04 '19 at 08:20

Unable to accurately search a particular text in a html tag using Python

2 Answers2