0

Am trying to get date sting from filename using regular expression in a python script. Here is my date sting

'2012-09-25 ag.pdf'

To get date string from this filename am using this regex r'\d{4}[-]\d{1,2}[-]\d{1,2}' and it working fine

but some filenames also contains two digit year and for those am trying another regex r'\d{2}-\d{2}-\d{2}'

'2012-09-25 ag.pdf' also matches with second regex (r'\d{2}-\d{2}-\d{2}') pattern that causing issue with my script

How can i use regex in python to match exact two digits not more than that

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Sony Khan
  • 1,330
  • 3
  • 23
  • 40

4 Answers4

2

You can create one regex for both cases:

^\d{2,4}-\d{1,2}-\d{1,2}

Demo: https://regex101.com/r/nZwZ58/4/

The good think about this first version is that is simpler and readable the bad thing is that will match a date with 3 digit.

The next one is more especific but more verbose.

^\d\d(\d\d)?-\d{1,2}-\d{1,2}

Demo: https://regex101.com/r/nZwZ58/3/

Pablo
  • 2,137
  • 1
  • 17
  • 16
  • 1
    Will this not result in matches for filenames with 3 digit years? Which would probably be errors. – PyPingu Jun 10 '19 at 11:30
  • Yes, you are right @PyPingu. I added a second option more verbose but more accurate. Thanks you your comment. – Pablo Jun 10 '19 at 11:38
1

you have at least 3 options here:

First option: match the 4-digit year date first, and it matches, don't try to match the 2-digit year date.

Second option: modify your 2-digit year option to be more restrictive:

r'^\d{2}-\d{2}-\d{2}'

Third option: use match on the basename, not search. match tries to match from the start. If it finds 4 digits it will be a no-match.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
0

Assuming that your date will always be at the start of the filename you could anchor your regex like so:

r'^\d{2}-\d{2}-\d{2}'

More docs here

EDIT: Could also use an or match:

r'^(\d{2}|\d{4})-\d{2}-\d{2}'

PyPingu
  • 1,697
  • 1
  • 8
  • 21
0

If Your filenames are always like this, you can prepend your regexp with ^ to match only on the beginning.

Zsolt Botykai
  • 50,406
  • 14
  • 85
  • 110