0

im trying to create an expression that matches 11.11.11 but not 111.11.111 i'm using python

keyword = re.compile(r"[0-9]*[0-9]\.[0-9]*[0-9]\.[0-9]*[0-9]")

the date could be at the start/end of a sentence and not have a white space but a next line before/after. how would i account for both ? as it is this will pick up up 11.11.11 but also 111.11.11111 etc :(

RY4N
  • 1,080
  • 3
  • 14
  • 31

3 Answers3

3

* means "zero or more of the preceding token". Therefore your regex will match anything from 1.1.1 to 999999.999999.99999 etc.

You can be more specific like this:

keyword = re.compile(r"\b[0-9]{2}\.[0-9]{2}\.[0-9]{2}\b")

The \b word boundary anchors make sure that the numbers start/end at that position. Otherwise you could pick up substring matches (matching 34.56.78 in the string 1234.56.7890, for example).

Of course, you'll need to validate whether it's actually a plausible date separately. Don't use regexes for this (it's possible but cumbersome), rather use the datetime module's strptime() classmethod.

Community
  • 1
  • 1
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
2

You can use \b to match a word boundary. For example, you could make your regular expression:

 re.compile(r'\b\d{2}\.\d{2}\.\d{2}\b')

I've also used \d to match any digit and the {2} suffix to match two instances of whatever came previously. If you want to match either 1 or 2 digits in any of those cases, you could change the {2} to {1,2}.

Mark Longair
  • 446,582
  • 72
  • 411
  • 327
1

Try using ? instead of * as a wildcard.

The ? matches 0 or 1 instances of the previous element. In other words, it makes the element optional; it can be present, but it doesn't have to be.

This will match both 1.1.1 and 11.11.11, but not 1111.1111.1111:

keyword = re.compile(r"\b[0-9]?[0-9]\.[0-9]?[0-9]\.[0-9]?[0-9]\b")
gparis
  • 1,247
  • 12
  • 32
  • 1
    sorry but that does match 111.1.111 because its not explicitly saying not more at the ends. – RY4N Mar 23 '11 at 08:20
  • @Ryan It will not _match_ `111.1.111`, because `re.match` only matches the entire string. However, `re.search` will still return a match, and that's what you meant, I guess. – Lauritz V. Thaulow Mar 23 '11 at 10:14