0

I have a text like this:

text='gn="right" headers="gr-Y10 gr-eps i36">121.11<\\/td><\\/tr><tr class="hr"><td colspan="12"><\\/td><\\/tr><tr>'

I would like to get the value 121.11 using regex out of it. So I did this:

import re
b=re.search('gr-Y10 gr-eps i36">(.*)<\\\\/td', text)
b.group(1)

and I got this as output:

'121.11<\\/td><\\/tr><tr class="hr"><td colspan="12">'

How can I get what I am really looking for, which is 121.11 instead of the line above?

TJ1
  • 7,578
  • 19
  • 76
  • 119

2 Answers2

8
gr-Y10 gr-eps i36">(.*?)<\\\\/td

                      ^^

make your * non greedy by appending ?.By making it non greedy it will stop at the first instance of <\\\\/td else it will capture upto last <\\\\/td.

See demo.

https://regex101.com/r/iS6jF6/2#python

vks
  • 67,027
  • 10
  • 91
  • 124
5

Knowing the source of the input data and taking into account it is HTML, here is a solution involving an HTML Parser, BeautifulSoup:

soup = BeautifulSoup(input_data)

for row in soup.select('div#tab-growth table tr'):
    for td in row.find_all('td', headers=re.compile(r'gr-eps')):
        print td.text

Basically, for every row in the "growth" table, we are finding the cells with gr-eps in headers ("EPS %" part of the table). It prints:

60.00
—
—
—
—
42.22
3.13
—
—
—
-498.46
...

This is a good read also.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195