1

I have the following file names, and looking to extract the number after "_R":

  • \FileName_R10.txt => 10
  • \FileName_R_10.txt => 10

I have successfully used the regex:

_R_?(\d+)\.txt$

Now, I'm looking to adapt it to work with the following variation:

  • \FileName_R10_1.txt => 10
  • \FileName_R_10_1.txt => 10
  • \FileName_R10_11.txt => 10

I tried

_R_?(\d+)_?\d+?\.txt$

which seems to work for the later examples, but breaks with the first ones.

Thanks.

ndnenkov
  • 35,425
  • 9
  • 72
  • 104
alhazen
  • 1,907
  • 3
  • 22
  • 43

2 Answers2

5
_R_?(\d+)(_\d+)?.txt$

The problem you were having is that \d+? makes the repetition lazy instead of making it optional. In other words, it was still trying to match at least one digit, just that it was trying to match the least amount (instead of the maximum amount) of digits for there to be a match.


EDIT: To use grouping without introducing a capturing group, you could use (?:):
_R_?(\d+)(?:_\d+)?.txt$

Edit: missing underscores

alhazen
  • 1,907
  • 3
  • 22
  • 43
ndnenkov
  • 35,425
  • 9
  • 72
  • 104
  • Thanks, that works. However, I'm wondering if this can be achieved without introducing a 2nd capturing group, because I'm only interested in the first value (10)? – alhazen Jan 04 '16 at 18:02
0

As \d isn't limited to just 0-9 digits: https://stackoverflow.com/a/6479605/5015529, I'd use: _R_?([0-9]+)(?:_[0-9]+)?\.txt$

Community
  • 1
  • 1