-3

Can someone explain how to use re.find all to separate only dates from the following strings? When the date can be either of the format- 1.1.2001 or 11.11.2001. There is volatile number of digits in the string representing days and months-

import re 
str = "This is my date: 1.1.2001 fooo bla bla bla"
str2 = "This is my date: 11.11.2001 bla bla foo bla"

I know i should use re.findall(pattern, string) but to be honest I am completely confused about those patterns. I don't know how to assemble the pattern to fit in my case.

I have found something like this but I absolutely don't know why there is the r letter before the pattern ... \ means start of string? d means digit? and number in {} means how many?

match = re.search(r'\d{2}.\d{2}.\d{4}', text)

Thanks a lot!

Abhisek Roy
  • 582
  • 12
  • 31
Slav3k
  • 37
  • 9
  • 2
    You should have a look at [the doc for re](https://docs.python.org/3/library/re.html), and you can have a look at https://regex101.com/ to test regular expressions interactively. – Thierry Lathuille Feb 20 '18 at 20:59
  • This is just perfect ... this online regular expression tester! Thanks! – Slav3k Feb 21 '18 at 07:07

3 Answers3

3

The r prefix to the strings tells the Python Interpreter it is a raw string, which essentially means backslashes \ are no longer treated as escape characters and are literal backslashes. For re module it's useful because backslashes are used a lot, so to avoid a lot of \\ (escaping the backslash) most would use a raw string instead.

What you're looking for is this:

match = re.search(r'\d{1,2}\.\d{1,2}\.\d{4}', text)

The {} tells regex how many occurrences of the preceding set you wanted. {1,2} means a minimum of 1 and a maxmium of 2 \d, and {4} means an exact match of 4 occurrences.

Note that the . is also escaped by \., since in regex . means any character, but in this case you are looking for the literal . so you escape it to tell regex to look for the literal character.

See this for more explanation: https://regex101.com/r/v2QScR/1

r.ook
  • 13,466
  • 2
  • 22
  • 39
0

There are actually two distinct processes happening in this code.

  1. When you enter some text "..." it first needs to be interpreted by the python interpreter at runtime
  2. Then the python interpreter passes the result result("...") to its own internal regex interpreter

In order to match a special character like a digit, python's internal regex interpreter supports special characters like \d. So the regex interpreter is expecting to get \d. Unfortunately, the character \ is also an escape character for the python interpreter in the first step of the process.

In order to avoid the python interpreter eating up \ and only passing d to the regex interpreter. We put r"..." in front of our strings to indicate a "raw string" - which means "Hey python interpreter, don't touch my \ characters!". This will result in the correct special characters being passed through.

AlanSTACK
  • 5,525
  • 3
  • 40
  • 99
0

Use r is a raw string which means it will not get escaped or altered by \ in a string

Python describes \ as this:

Either escapes special characters (permitting you to match characters like '*', '?', and so forth), or signals a special sequence;

Basically meaning that if you use a character that would normally be a special character to regex it ignores this.

{} are used for repetitions:

Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string 'aaaaaa', a{3,5} will match 5 'a' characters, while a{3,5}? will only match 3 characters.

Meaning that it will repeat the previous character the number you specified in {}

\d is a special character that matches any digit from 0 to 9.

I highly recommend you this tutorial

re.findall() returns a list of everything it matches using that regex.

Xantium
  • 11,201
  • 10
  • 62
  • 89