I have a sentence like so:
s = " foo hello hello hello I am a big mushroom a big mushroom hello hello bye bye bye bye foo"
I would like to find all the consecutive repetitions of sequences of words and the number of times each sequence is repeated. For the example above:
[('hello', 3), ('a big mushroom', 2), ('hello', 2), ('bye', 4)]
I have a solution that almost works for words of only one character based on regexp but I can't extend it to the case of real words:
def count_repetitions(sentence):
return [(list(t[0]),''.join(t).count(t[0])) for t in re.findall(r'(\w+)(\1+)', ''.join(sentence))]
l=['x', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'g', 'h', 'i', 'i', 'i', 'i', 'a', 'b', 'c', 'd']
count_repetitions(sentence)
>>> [(['a', 'b', 'c'], 3), (['g', 'h'], 2), (['i', 'i'], 2)]
Note that i would like (['i'], 4)
for the last element.
Each word is separated by a space character.