1

Another user already opened the discussion on how to find repeated phrases in Python, but focusing only on phrases of three words.

The answer of Robert Rossney was complete and working (it is here repeated phrases in the text Python) , but can I ask for a method that simply finds repeated phrases, notwithstanding their length? I think it is possible to elaborate on the method already elaborated in the previous discussion, but I am not pretty sure on how to do it.

I think this is the function that might be modified in order to return tuples of different lenght:

def phrases(words):
    phrase = []
    for word in words:
        phrase.append(word)
        if len(phrase) > 3:
            phrase.remove(phrase[0])
        if len(phrase) == 3:
            yield tuple(phrase)
Community
  • 1
  • 1
hugi coapete
  • 225
  • 2
  • 10

1 Answers1

1

One simple modification is to pass word length to phrases method and then call the method with different word lengths.

def phrases(words, wlen):
  phrase = []
  for word in words:
    phrase.append(word)
    if len(phrase) > wlen:
        phrase.remove(phrase[0])
    if len(phrase) == wlen:
        yield tuple(phrase)

And then define all_phrases as

def all_phrases(words):
   for l in range(1, len(words)):
      yield phrases(words, l)

And then one way of using it is

for w in all_phrases(words):
   for g in w:
     print g

For words = ['oer', 'the', 'bright', 'blue', 'sea'], it produces:

('oer',)
('the',)
('bright',)
('blue',)
('sea',)
('oer', 'the')
('the', 'bright')
('bright', 'blue')
('blue', 'sea')
('oer', 'the', 'bright')
('the', 'bright', 'blue')
('bright', 'blue', 'sea')
('oer', 'the', 'bright', 'blue')
('the', 'bright', 'blue', 'sea')
Sudeep Juvekar
  • 4,898
  • 3
  • 29
  • 35