-2

I'm trying to find all duplicated words in text, each duplicate contained in a tulpes and save all tuples in a list. it needs to colclude cases with punctuation between the words like "so, so"

I tried to use the pattern:

/(\b\S+\b)\s+\b\1\b/

but it doesnt return what im looking for, and got trouble with saving the results in the form i need

example of what im looking for:

the text = "i went to to a party, party at my uncle's house"

Output at the end of the function:

[(to ,to), (party, party)]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
YoavP9
  • 1
  • 1
    Regular expressions are not the right tool for this. Instead, try splitting up the string into words or tokens and check for duplicates with logic in a loop. – thshea Jun 07 '21 at 14:38
  • (`what [I'm] looking for` looks horrible: why do you need the repeated words repeated? (Do you *really need* the blank *before* the comma for even length words and `, ` for odd?)) – greybeard Jun 07 '21 at 16:29
  • yes. it's a specific demand in my class to build regex expression for finding those duplicates – YoavP9 Jun 08 '21 at 06:32
  • here : https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words – The shape Jun 09 '21 at 15:40
  • thanks mate, helped me a lot – YoavP9 Jun 10 '21 at 15:35

1 Answers1

0

Regex is for finding specific patterns and not words what you should do is what @thshea said or you can use this code:

_answer_ = []
the_text = "i went to to a party, party at my uncle's house"
the_text = the_text.replace(",","")
words = the_text.split(" ")
words2 = list(set(words))
for word in list(words2):
  if word in words:
    words.remove(word)
for word2 in words:
  _answer_ += [tuple([word2,word2])]
_answer_
The shape
  • 359
  • 3
  • 10