Note that question was already marked a duplicate by Community (but of an incorrect question). I changed it to reflect the correct one.
There's a similar question tagged with JavaScript, but needs a little modification for python.
import re
text = "Hi, hi Jane! I'm so. So glad to to finally be able to write - WRITE!! - to you!"
repeats = re.findall(r'\b(\w+)\b(?=.*\b\1\b)', text, re.I)
print(repeats)
['Hi', 'so', 'to', 'to', 'to', 'write']
repeats = list(map(str.lower, repeats))
Now, create a counter.
from collections import Counter
c = Counter(repeats)
print(c)
Counter({'Hi': 1, 'so': 1, 'to': 3, 'write': 1})
Or, more primitively:
r_set = set(repeats)
c = {w : repeats.count(w) for w in r_set}
print(c)
{'hi': 1, 'so': 1, 'to': 3, 'write': 1}
The values of the keys are the number of repeats. If the value of 'Hi'
is 1, that means 'Hi'
occurred twice. And so on.
The regex is
\b(\w+)\b(?=.*\b\1\b)
Details
\b
- word boundary
(\w+)
- capturing group for a word
\b
- word boundary
(?=.*\b\1\b)
- lookahead, consisting of
.*
anything
\b\1\b
the same word captured in the first group. Here, \1
is the reference to the first group.