-2

I have a sentence:

"Fourth-quarter 2021 net earnings per share (EPS) of $1.26, compared with 2020 EPS of $1.01; Fourth-quarter 2021 adjusted EPS of $1.11, down 25.5 percent compared with 2020 adjusted EPS of $1.49"

and would like to get number $1.11 after the first substring "adjusted EPS".

The best regex formula I could come with is:

re.search("^.*Adjusted EPS.*?(\$\d+.\d+).*", text,re.IGNORECASE).group(1)

but this gives me number $1.49 after second occurrence of "adjusted EPS".

How can I modify the search so I get the number $1.11?

martineau
  • 119,623
  • 25
  • 170
  • 301
PeterL
  • 465
  • 1
  • 6
  • 14

3 Answers3

0

This regex string should work. /adjusted EPS of ?(\$\d+.\d+)/g

Input:

Fourth-quarter 2021 net earnings per share (EPS) of $1.26, compared with 2020 
EPS of $1.01; Fourth-quarter 2021 adjusted EPS of $1.11, down 25.5 percent 
compared with 2020 adjusted EPS of $1.49

Output: adjusted EPS of $1.11, adjusted EPS of $1.49

Edit: Remove the g at the end of the Regex string to only find one match.

Exortions
  • 303
  • 2
  • 7
0

You could use this pattern which looks for "adjusted EPS" and only allows one "$" between it and the end of the line.

/adjusted EPS[^\$]+(\$\d+\.\d+)[^\$]+$/gm

the pattern without the endings is

adjusted EPS[^\$]+(\$\d+\.\d+)[^\$]+$
-1

The problem here is greedy regex which you use just in the beginning:

^.*Adj ...

^ means the start of the string. Being greedy, .* "eats" as much characters as possible up until the last "adjusted EPS"

There're two solutions here, either make it non-greedy (i.e. lazy) ^.*?Adj ..., or remove ^.* completely - I see no use of it here

nicael
  • 18,550
  • 13
  • 57
  • 90
  • Note it's `^.*adj...`. Nor does `.*` at the end serve a purpose. Perhaps `\badjusted EPS\b.*?(\$\d+.\d{2})`, the word boundaries to avoid matching, for example, `"readjusted EPS"` (probably not needed but does no harm). [Demo](https://regex101.com/r/8dDH5J/1) – Cary Swoveland Mar 28 '22 at 18:58