49

I need to extract the last number that is inside a string. I'm trying to do this with regex and negative lookaheads, but it's not working. This is the regex that I have:

\d+(?!\d+)

And these are some strings, just to give you an idea, and what the regex should match:

ARRAY[123]         matches 123 
ARRAY[123].ITEM[4] matches 4
B:1000             matches 1000
B:1000.10          matches 10

And so on. The regex matches the numbers, but all of them. I don't get why the negative lookahead is not working. Any one care to explain?

codaddict
  • 445,704
  • 82
  • 492
  • 529
korbes
  • 1,273
  • 1
  • 12
  • 18

4 Answers4

128

Your regex \d+(?!\d+) says

match any number if it is not immediately followed by a number.

which is incorrect. A number is last if it is not followed (following it anywhere, not just immediately) by any other number.

When translated to regex we have:

(\d+)(?!.*\d)

Rubular Link

codaddict
  • 445,704
  • 82
  • 492
  • 529
  • 1
    +1, this is much cleaner than my `(?:\D|^)` mess ;-) (and closer to the OP's original regex too) – Cameron Mar 16 '11 at 04:48
  • 1
    Thanks for the explanation. I haven't realized that I needed to include the .* to be *not just immediately*. – korbes Mar 16 '11 at 12:31
  • Was banging my head against the desk on this one. Thank you for an elegant solution – Steven Garcia Oct 31 '12 at 11:18
  • Love this solution, because it's a good starting ground for any other "last x in string" needs. The format is `(regex)(?!.*(regex))` Personally I like looking for any decimal numbers so the regex I often use is: `((?:\d*\.)?\d+)(?!.*((?:\d*\.)?\d+))` – TheUnknownGeek Aug 03 '17 at 06:31
  • That's nice! But how do I match an expression that starts after the last digit but before a specific word? `f1rst number 77, 2 substring-that-I-need before KEYWORD 3 asd 555`? I would like to get this part `substring-that-I-need before ` – help-ukraine-now Jul 11 '19 at 20:25
11

I took it this way: you need to make sure the match is close enough to the end of the string; close enough in the sense that only non-digits may intervene. What I suggest is the following:

/(\d+)\D*\z/
  1. \z at the end means that that is the end of the string.
  2. \D* before that means that an arbitrary number of non-digits can intervene between the match and the end of the string.
  3. (\d+) is the matching part. It is in parenthesis so that you can pick it up, as was pointed out by Cameron.
sawa
  • 165,429
  • 45
  • 277
  • 381
10

You can use

.*(?:\D|^)(\d+)

to get the last number; this is because the matcher will gobble up all the characters with .*, then backtrack to the first non-digit character or the start of the string, then match the final group of digits.

Your negative lookahead isn't working because on the string "1 3", for example, the 1 is matched by the \d+, then the space matches the negative lookahead (since it's not a sequence of one or more digits). The 3 is never even looked at.

Note that your example regex doesn't have any groups in it, so I'm not sure how you were extracting the number.

Cameron
  • 96,106
  • 25
  • 196
  • 225
  • Just curious, why the `(?:\D|^)` bit of your regex? Doesn't `.*` handle it just fine? – jb. Mar 16 '11 at 03:43
  • 2
    @jb: Heh, I started out with that then had to delete my answer while I came up with `(?:\D|^)`. The problem with `.*(\d+)` is that only the last single digit will be matched (since the engine stops as soon as the regex is satisfied, which it will be after backtracking one digit character) – Cameron Mar 16 '11 at 04:04
  • If you somehow anchor from the beginning of the string as with your `.*`, you need your `(?:\D+^)`, or equivalently, `[\D\A]`. If you anchor from the end of the string, you do not need it, as in codaddict or my answer. – sawa Mar 16 '11 at 04:35
  • @sawa: Ooh, `\A`, I always forget about those anchors. Unfortunately, my Python 2.6 chokes on it when it's in a character class together with `\D` – Cameron Mar 16 '11 at 04:44
0

I still had issues with managing the capture groups (for example, if using Inline Modifiers (?imsxXU)).

This worked for my purposes -

.(?:\D|^)\d(\D)

Elysiumplain
  • 711
  • 8
  • 21