Count repetitive words in string using regex?

Question

I have a string:

"Hi, hi Jane! I'm so. So glad to to finally be able to write - WRITE!! - to you!"

and i need to count pairs of repetitive words.

def repetitionEncryption(letter):
    pattern = ???
    regex = re.compile(???)
    return len(re.findall(regex, letter))

Thank You for attention.

You'd like us to provide you with a regex instead of "????" ? — Killer Death, Aug 31 '17 at 16:58
@KillerDeath How so? You can create **recursive regex**, see the answer I linked to above. — ctwheels, Aug 31 '17 at 17:01
Then you need to provide examples of the desired result, it's not clear what you want. — Brad Koch, Aug 31 '17 at 17:02
@AvoAsatryan I beg to differ: https://regex101.com/r/PLxLSi/1 — ctwheels, Aug 31 '17 at 17:03
Then the user should clarify what they want to do, because all the user mentioned is to *catch repetitive words* - which https://stackoverflow.com/a/2823037/3600709 does exactly — ctwheels, Aug 31 '17 at 17:04
i need to count repetitive words, but cases of this words may differ. for exaple :write - WRITE!! — Avo Asatryan, Aug 31 '17 at 17:06
Like this: `\b(\w+)[^\w]+\1\b`? https://regex101.com/r/BCwpWO/1 — ctwheels, Aug 31 '17 at 17:07
repetetive words anywhere within string or one after the other? If Jane repeats anywhere within string, then it is one more repetitive word to count? — Killer Death, Aug 31 '17 at 17:07
@AvoAsatryan if you're looking for repetition throughout the whole string then Coldspeed has your answer. If you're looking for immediate repetition (one word after the other) then `\b(\w+)[^\w]+\1\b` https://regex101.com/r/BCwpWO/1 (as per my previous comment) is your answer (note the modifier `i` so that it's case insensitive) — ctwheels, Aug 31 '17 at 17:11
@ctwheels, if Jane appears anywhere else in the string, your regex will not catch it. It relies on non word character separator, one or more of them actually — Killer Death, Aug 31 '17 at 17:11
@KillerDeath you didn't read my last comment which explains my answer and coldstone's answer. The question is fairly broad when the word *repetition* is used. The user never specified immediate repetition vs full string repetition — ctwheels, Aug 31 '17 at 17:13
He said: situation of this words isn't important for this task — Killer Death, Aug 31 '17 at 17:13
Note that this was already marked a duplicate (but of an incorrect question). I changed it to reflect the correct one. — cs95, Aug 31 '17 at 18:15

cs95 · Accepted Answer · 2017-08-31T18:16:56.557

_{Note that question was already marked a duplicate by Community (but of an incorrect question). I changed it to reflect the correct one.}

There's a similar question tagged with JavaScript, but needs a little modification for python.

import re

text = "Hi, hi Jane! I'm so. So glad to to finally be able to write - WRITE!! - to you!"
repeats = re.findall(r'\b(\w+)\b(?=.*\b\1\b)', text, re.I)
print(repeats)
['Hi', 'so', 'to', 'to', 'to', 'write']

repeats = list(map(str.lower, repeats))

Now, create a counter.

from collections import Counter

c = Counter(repeats)
print(c)
Counter({'Hi': 1, 'so': 1, 'to': 3, 'write': 1})

Or, more primitively:

r_set = set(repeats)
c = {w : repeats.count(w) for w in r_set} 
print(c)
{'hi': 1, 'so': 1, 'to': 3, 'write': 1}

The values of the keys are the number of repeats. If the value of 'Hi' is 1, that means 'Hi' occurred twice. And so on.

The regex is

\b(\w+)\b(?=.*\b\1\b)

Details

\b - word boundary
(\w+) - capturing group for a word
\b - word boundary
(?=.*\b\1\b) - lookahead, consisting of
- .* anything
- \b\1\b the same word captured in the first group. Here, \1 is the reference to the first group.

score 0 · Answer 2 · answered Aug 31 '17 at 16:59

0

One suggestion would be to split the sentence up into an array and the compare each item in the array. You wouldn't be using regex. With regex you need to know what you are looking for ahead of time. Say you want to know how many times 'Jane' is in the sentence.

answered Aug 31 '17 at 16:59

steveo314

21
3

Yes, i can write such a code. but i can only use regex in this task – Avo Asatryan Aug 31 '17 at 17:03
1

This is a homework? – Killer Death Aug 31 '17 at 17:04
1

Seems like it now, Killer Death – steveo314 Aug 31 '17 at 17:34

Count repetitive words in string using regex?

2 Answers2