2

I want to find doubled-word(s) in a text, i used (\w+) +\1 it works, but however it only finds "abc abc" in the text.

i also want to find "abc def abc def"

thanks,..

WhoSayIn
  • 4,449
  • 3
  • 20
  • 19
  • Which regexp are you using? What language? – gnarf Aug 11 '09 at 10:41
  • Can you actually do this in regular languages? Seems like this is as impossible as matching parentheses, another well-known impossibility in conventional regexpen – MSalters Aug 11 '09 at 10:54
  • It's quite possible with backreferences (notably not a feature of formal regex, but a feature of most modern regex engines). – Amber Aug 11 '09 at 10:57
  • @gnarf: i used it on PHP @MSalters: as @Dav said, its possible with back references i used "\1" btw, the solution is "(\w.*) +\1" – WhoSayIn Aug 11 '09 at 12:35

4 Answers4

4

The following regex will match any repeated sequence of characters:

/(.+).*?\1/

If you only want repeated sequences that have nothing but whitespace in between, then use this instead:

/(.+)\s+?\1/

If you only want words separated by whitespace, change the (.+) to a (\w+):

/(\w+)\s+?\1/

If you want to look at words ignoring things like punctuation, word borders might be more useful:

/(\b\w+?\b)\.+?\b\1\b/
Amber
  • 507,862
  • 82
  • 626
  • 550
1

Not sure what you want it to match but it could be as simple as changing it to:

(\w+) +.*\1

the .* will match any extra characters which might be in between.

This will match the 'abc def abc' part of 'abc def abc def', If you want to match it all change it to:

(\w+) +.*\1.*

Salgar
  • 7,687
  • 1
  • 25
  • 39
  • thanks for your answer but it didnt work. now, i tried "((\w| )+) +\1" it works!! but it also finds " " (spaces more than 3) – WhoSayIn Aug 11 '09 at 10:45
1

"(\w.*) +\1" maybe? or does this get too general for your needs?

"(\w+(?:\s+\w+)*) +\1" might work as well.

gnarf
  • 105,192
  • 25
  • 127
  • 161
1

are you trying to delete the duplicates? or you can also check this answer

Community
  • 1
  • 1
pageman
  • 2,864
  • 1
  • 29
  • 38