When you do this:
r"(.+) \1"
means that \1
should match what is captured exactly by the first group. It didn't match "abc cde"
because first group captured abc
the so it's like you are matching this: re.match(r'abc abc', text)
.
This called back reference a group.
For example you need to match a text that start end ends with the same letters:
import re
pattern = r"(\w).+\1"
match = re.match(pattern, "ABA") # OK
match = re.match(pattern, "ABC") # NO
Another example match text that start with 3 letters and ends with this letters in the inverse order
import re
pattern = r"(\w)(\w)(\w)\3\2\1"
re.match(pattern, 'ABCCBA') # OK
re.match(pattern, 'ABCCBC') # NO
Note: you can only back-reference only a capturing group, means that this is not valid (?:.+) \1
because
the first group will match and not capture anything so you cannot back-reference it.
Edits
+
which matches one or more times, requires at least one occurrence
*
matches zero or more times
ca+t
match cat, caat , caaat
: matches c
followed by at least one a
or more followed by t
.
ca+t
match ct, cat , caaaaat
: matches c
followed by zero or or more a
followed by t