0

I am using Python. I want to make a regex that allos the following examples:

Day
Dday
Daay
Dayy
Ddaay
Ddayy
...

So, each letter of a word, one or more times. How can I write it easily? Exist an expression that make it easy? I have a lot of words. Thanks

Kevin
  • 74,910
  • 12
  • 133
  • 166
Brian M Litwak
  • 77
  • 1
  • 1
  • 4

4 Answers4

1

We can try using the following regex pattern:

^([A-Za-z])\1*([A-Za-z])\2*([A-Za-z])\3*$

This matches and captures a single letter, followed by any number of occurrences of this letter. The \1 you see in the above pattern is a backreference which represents the previous matched letter (and so on for \2 and \3).

Code:

word = "DdddddAaaaYyyyy"
matchObj = re.match( r'^([A-Za-z])\1*([A-Za-z])\2*([A-Za-z])\3*$', word, re.M|re.I)

if matchObj:
    print "matchObj.group() : ", matchObj.group()
    print "matchObj.group(1) : ", matchObj.group(1)
    print "matchObj.group(2) : ", matchObj.group(2)
    print "matchObj.group(3) : ", matchObj.group(3)
else:
    print "No match!!"

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

To match a character one or more times you can use the + quantifier. To build the full pattern dynamically you would need to split the word to characters and add a + after each of them:

pattern = "".join(char + "+" for char in word)

Then just match the pattern case insensitively.

Demo:

>>> import re
>>> word = "Day"
>>> pattern = "".join(char + "+" for char in word)
>>> pattern
'D+a+y+'
>>> words = ["Dday", "Daay", "Dayy", "Ddaay", "Ddayy"]
>>> all(re.match(pattern, word, re.I) for word in words)
True
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
0

Try /d+a+y+/gi:

  • d+ Matches d one or more times.
  • a+ Matches a one or more times.
  • y+ Matches y one or more times.
Ethan
  • 4,295
  • 4
  • 25
  • 44
0

As per my original comment, the below does exactly what I explain.

Since you want to be able to use this on many words, I think this is what you're looking for.

import re

word = "day"

regex = r"^"+("+".join(list(word)))+"+$"

test_str = ("Day\n"
    "Dday\n"
    "Daay\n"
    "Dayy\n"
    "Ddaay\n"
    "Ddayy")

matches = re.finditer(regex, test_str, re.IGNORECASE | re.MULTILINE)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

This works by converting the string into a list, then converting it back to string, joining it on +, and appending the same. The resulting regex will be ^d+a+y+$. Since the input you presented is separated by newline characters, I've added re.MULTILINE.

ctwheels
  • 21,901
  • 9
  • 42
  • 77