-2

I'm new to Python and to Regex. Here is my current problem, for which I have not managed to find any straight answer online. I have a string of 5 or more characters, for which I need to search for all the possible combinations of 5 characters.

I wonder if it's doable with regular expressions (instead of, say, creating a list of all possible 5-character combinations and then testing them in loop with my string).

For example, let's say my string is "stackoverflow", I need an expression that could give me a list containing all the possible combinations of 5 successive letters, such as: ['stack', 'tacko', ackov', ...]. (but not 'stcko' or 'wolfr' for example).

That's what I would try:

import re
word = "stackoverflow"
list = re.findall(r".....", word)

But printing this list would only give:

['stack', 'overfl']

Thus it seems that a position can only be matched once, a 5-character combination cannot concern a position that has already been matched.

Could anyone help me better understand how regex work in this situation, and if my demand is even possible directly using regular expressions?

Thanks!

3 Answers3

0

I think you could just use a simple loop with a sliding window of size 5

word = "stackoverflow"
result=[]
for i in range(len(word)-5):
    result.append(word[i:i+5])
print(result)

This is quite efficient as it runs on O(n) linear time

0

If the letters are always consecutive, this will work:

wd = "stackoverflow" 
lst = ["".join(wd[i:i+5]) for i in range(len(wd)-4)]
print(lst)

Output

['stack', 'tacko', 'ackov', 'ckove', 'kover', 'overf', 'verfl', 'erflo', 'rflow']
Mike67
  • 11,175
  • 2
  • 7
  • 15
0

Because as I can see in findall documentation string it returns all non-overlapping matches:

def findall(pattern, string, flags=0):
    """Return a list of all non-overlapping matches in the string.

    If one or more capturing groups are present in the pattern, return
    a list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result."""
    return _compile(pattern, flags).findall(string)

Look at solutions without regex usage in your topic.

AmaHacka
  • 104
  • 6