How do I use python regex to match all possible sequences of 1, 2, 3, and 4 words in a string? All sequences must be of adjacent words only. So:
str1 = 'AA BB CC DD EE FF GG HH'
matches = re.findall(r'insert ninja regex here', str1)
for match in matches:
print match
Should output:
AA
AA BB
BB
AA BB CC
BB CC
CC
AA BB CC DD
BB CC DD
CC DD
DD
BB CC DD EE
CC DD EE
DD EE
EE
... etc etc
Thanks
Possible solution with four regexes (let me know if you have a more efficient and faster way of doing this):
matches4 = re.findall(r'(?=((?:\s\S+){3}\s\S+))', str1)
matches3 = re.findall(r'(?=((?:\s\S+){2}\s\S+))', str1)
matches2 = re.findall(r'(?=((?:\s\S+){1}\s\S+))', str1)
matches1 = re.findall(r'(?=(\s\S+))', str1)
THE RESULTS ARE IN:
I ran all 4 answers on a string with 138.2k characters and 22.2k words:
my answer=0.0856201648712s.
zx81 answer option 1=0.0598151683807s.
zx81 answer option 2=0.0905468463898s.
Greg Hewgill answer=0.0292818546295s.
THE WINNER IS GREG! However, zx81 gets the answer check for a regex solution. You all got an up vote.