4

I am trying to create a regex expression that starts with a certain word and ignores any other same proceeding words.

For example, if my string starts with the word "dog" and ends with "fish", how do I ignore any proceeding "dog" words and only match the last one?

dog cat fish

dog dog cat fish <- ignore first word "dog" and match second "dog" word.

dog dog dog cat fish <- ignore first and second "dog" words and match third "dog" word.

  • 1
    Is it possible you would have a string such as `dog dog dog cat fish dog dog fish cat`? – l'L'l Apr 12 '14 at 02:09
  • What are the constraints on the word at the start which may or may not repeat? e.g. Is it always 3 letters? Is it a word in a dictionary? (How would you resolve Do from Dog?) Is it always "dog"? – ClickRick Apr 12 '14 at 05:06

3 Answers3

3

The following regex works:

(\b\w+\b |\b\w+\b$)(?!\1) with the m and g flags enabled

Demo: http://regex101.com/r/dW9fP5

As per your new request:

(\b\w+\b|\b\w+\b$)(?!\1) with the m and g flags enabled

sshashank124
  • 31,495
  • 9
  • 67
  • 76
  • This also works: `(\b\w+) (?!\1).*`. http://regex101.com/r/uM9zH7. Or more precisely: `(\b\w+) (?!\1)(?:\w+ )*\w+$`: http://regex101.com/r/jM1zY2 – aliteralmind Apr 12 '14 at 02:00
  • Wow, thanks for the quick response! Is there a way that I could get this to work without spaces? To ignore any proceeding "dog" words after the last "dog" word? Such as dogdogcatfish – user3525737 Apr 12 '14 at 02:29
  • @user3525737, What do you mean? Your string is __not__ separated by spaces? – sshashank124 Apr 12 '14 at 02:30
  • Yeah, when I posted originally I forgot to mention that they are not separated by spaces (one long string). I appreciate your help so far :D. – user3525737 Apr 12 '14 at 02:32
  • @user3525737, Sure you can, I have updated my answer. – sshashank124 Apr 12 '14 at 02:45
  • 1
    @user3525737: Please be careful to post all relevant information in your original question. This "Sixth Sense" kind of question (twist at the end that changes everything) can be pretty frustrating. – aliteralmind Apr 12 '14 at 02:52
2

To strip out space separated duplicates:

dog dog dog cat cat fish:

(?>(\w+) (?=\1\b))+

test at: regex101, eval.in (if php)

Using a lookahead to check if match of first parenthesized group is ahead (preceded by a space).


To match duplicates only at string start, add the ^ anchor at the beginning:

dog dog dog cat cat fish

^(?>(\w+) (?=\1\b))+

test at regex101


EDIT: Question has obviously changed to matching consecutive character sequences in one long string without spaces. Pattern modified a bit to strip out sequences of at least 3 characters at start:

dogdogdogcatcatfish

^(?>(\w{3,})(?=\1))+

test at regex101


Replace with empty string ""

Regex FAQ

Community
  • 1
  • 1
Jonny 5
  • 12,171
  • 2
  • 25
  • 42
1

Here's a simple (literal) pattern:

.*(dog)

Replace Pattern:

\1 

Not the most exciting, but might as well show it. The target word in parentheses sets to match group \1

example: http://regex101.com/r/yU6xO8

l'L'l
  • 44,951
  • 10
  • 95
  • 146