5

For example:

$s1 = "Test Test the rest of string"
$s2 = "Test the rest of string"

I would like to match positively $s1 but not $s2, because first word in $s1 is the same as second. Word 'Test' is example, regular expression should work on any words.

codaddict
  • 445,704
  • 82
  • 492
  • 529
user594791
  • 617
  • 1
  • 8
  • 15

4 Answers4

8
if(preg_match('/^(\w+)\s+\1\b/',$input)) {
  // $input has same first two words.
}

Explanation:

^    : Start anchor
(    : Start of capturing group
 \w+ : A word
)    : End of capturing group
\s+  : One or more whitespace
\1   : Back reference to the first word
\b   : Word boundary
codaddict
  • 445,704
  • 82
  • 492
  • 529
4
~^(\w+)\s+\1(?:\W|$)~
~^(\pL+)\s+\1(?:\PL|$)~u // unicode variant

\1 is a back reference to the first capturing group.

NikiC
  • 100,734
  • 37
  • 191
  • 225
1

Not working everywhere, see the comments...

^([^\b]+)\b\1\b
^(\B+)\b\1\b

Gets the first word, and matches if the same word is repeated again after a word boundary.

poke
  • 369,085
  • 72
  • 557
  • 602
  • A `\b` in character class is not word boundary but a backspace character. – codaddict Jan 30 '11 at 15:15
  • @codaddict: Thanks, wasn't sure if it was like this or that :) – poke Jan 30 '11 at 15:16
  • [`\b` and `\B`](http://www.regular-expressions.info/wordboundaries.html) are zero width assertions, they will not match anything, certainly not next to each other. That said, you've inspired this: [What regular expression can never match? ](http://stackoverflow.com/questions/1845078/what-regular-expression-can-never-match/4850260#4850260) – Kobi Jan 31 '11 at 11:23
  • Bleh, I have tested my original solution (`^([^\b]+)\b\1\b)`) in a language that does allow `\b` inside character classes (ActionScript), so matching any non-boundary did work; it matched the whole word including the following whitespace. Given that it doesn't work with `\B` I would delete this answer now, but I'll keep it for that inspiration reference ;) – poke Jan 31 '11 at 11:38
  • 1
    It wasn't really non-boundary, in fact it was non-backspace! But then `\b` after that is a boundary and a little limiting, it means the first group, `\1`, must start with a word character and end with a non-word-character (or the other way around), so it will match `test!test!d`, for example, but not exactly for the reasons you think. `\1` includes the "space" here, which is `!`, which matches because it isn't a backspace! Here are a few more examples: http://rubular.com/r/1y239zNydK – Kobi Jan 31 '11 at 11:55
1

This does not cause Test Testx to return true.

$string = "Test Test";

preg_match('/^(\w+)\s+\1(\b|$)/', $string);