11

Is there any way to, using regular expression, return the nth occurrence? I did search the forums and only found solution that goes beyond regular expression itself (i.e. needs support from the programming language).

Example: Regex:

(?:\$(\d+(?:,\d{3})*\.\d{2}))

Input:

    thiscanbeanything$25.74thiscanbesomethingelse
alsowithnewlines$533.63thisonetoo$54.32plusthis$62.42thisneverends

I'd need to extract the first one (which is 25.74). Later on I might need to extract the third one (which is 54.32).

My regex is currently matching all occurrences. I could retrieve the nth element after matches but my question is: is it possible to do it via regular expression only (i.e. the regular expression will return only the nth element I want)?

Thanks!

igorjrr
  • 790
  • 1
  • 11
  • 22
  • It can be done with `^P{n-1}(P)` approach (where `P` is what you're trying to match. But it'll damage your code readability. I think you're already using good implementation. – Ivan Nevostruev May 09 '14 at 02:19
  • This also depends on what language you're using, but if the engine you're using fully supports lookbehind, that would be useful. Essentially you match your regex, but preceded by X other matches of it. – Kendall Frey May 09 '14 at 02:21
  • @KendallFrey Not working: `(?si)(?:(?<=\$[\d.]+){1})(\$[\d.]+)` – igorjrr May 09 '14 at 03:03

1 Answers1

8

for nth match use this pattern (?:.*?\$[0-9.]+){XX}.*?(\$[0-9.]+)
where XX = n-1

Example for 3rd match

alpha bravo
  • 7,838
  • 1
  • 19
  • 23
  • It wasn't obvious to me that the example linked to was missing a number in the curly braces. Also, it seems that it will work with {n} in the curly braces, not {n-1}. Example: https://regex101.com/r/uE0vM5/2 – Alex Hall May 01 '15 at 17:37
  • 2
    But this highlights up to and including the nth element, how do you return the contents of only group 1? – Connor Aug 06 '19 at 21:06
  • You don't need the capture group if [the regex engine supports `\K`](https://stackoverflow.com/questions/13542950/support-of-k-in-regex): `(?:.*?\$[0-9.]+){XX}.*?\K\$[0-9.]+`. `\K` resets the starting point of the reported match to the current position in the string and discards all characters previously matched from the reported match. – Cary Swoveland Dec 30 '21 at 23:20