0

I want my regex to be able to match strings of random chars optionally followed by some digits - but if both matches are empty I want the match to fail. I am currently constructing the regex as in:

regex = u'^(.*)'
if has_digits: regex += u'(\d*)'
regex += ext + u'$' # extension group as in u'(\.exe)'
rePattern = re.compile(regex, re.I | re.U)

but this also matches empty filenames (with extension only). Can't wrap my head around similar questions like:

The extra complication is that the second group (the digits) may not be added

So valid:

abc%.exe
123.exe

If has_digits is true:

abc 123.exe # I want the second group to contain the 123 not the first one

Invalid : .exe

Community
  • 1
  • 1
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361

2 Answers2

2

Regex:

^(.*?)(\d+)?(?<=.)\.exe$

Positive lookbehind assures that there is at least one character before extension part.

Live demo

Integrated:

regex = '^(.*?)'
if has_digits: regex += '(\d+)?'
regex += '(?<=.)' + ext + '$'
rePattern = re.compile(regex, re.I | re.U)
revo
  • 47,783
  • 14
  • 74
  • 117
  • Thanks - what does the ? do in `(.*?)` ? Is it equivalent to `(.*)?` ? EDIT: followed your link - explains all :) Let me test this... – Mr_and_Mrs_D Oct 05 '16 at 21:39
  • Still not quite sure the ? is needed - if I omit it it will eat up the digits ? – Mr_and_Mrs_D Oct 05 '16 at 21:43
  • 1
    `(.*?)` differs from `(.*)?` in the sense that `.*?` is an un-greedy dot star quantifier (which can consume no character at all) but `(.*)?` is a greedy dot star quantifier that consumes characters as much as possible (the reason why digits are not captured by the second group). The latter is made optional by appending `?` to grouping construct. – revo Oct 05 '16 at 21:46
1

You can use this lookahead based regex:

ext = r'\.exe'

regex = r'^(?=.+\.)(.*?)'
if has_digits: regex += r'(\d*)'
regex += ext + '$'
rePattern = re.compile(regex, re.I | re.U)
# ^(?=.+\.)(.*?)(\d*)\.exe$

RegEx Demo

Lookahead (?=.+\.) ensures presence of at least one character before DOT.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Thanks - interesting variation - just @revo got there first :) – Mr_and_Mrs_D Oct 05 '16 at 21:53
  • Yes, that also works. This is a bit faster due to lookahead and no optional group. – anubhava Oct 05 '16 at 21:55
  • Haha thanks (it would be unfair to unaccept but I will probably use yours then :) Your using `r''` just made me realize that: _Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string_ -> https://docs.python.org/2/reference/lexical_analysis.html#string-literals – Mr_and_Mrs_D Oct 05 '16 at 22:01