Firstly, we need a regex to match words. The simplest is to use the word metacharacter class, \w
. Note that this will match not only letters but digits and the underscore; if it's not the appropriate character class, you'll need to substitute something else (such as the Unicode letter category \P{L}
).
Next, a regex that will match a full word, which is fairly straight forward. Simply match a sequence of word-characters, anchored by word boundaries:
\b\w+\b
Next, capture the first letter:
\b(\w)\w*\b
Finally, use a negative lookahead to negate the backreference. With some engines, you can do this with a lookbehind (demo):
\b(\w)\w*(?<!\1)\b
When working with multiple-characters patterns that break down parts into sub-patterns, it's important to consider the qualifiers. Note that \w\w*
is equivalent to \w+
, and so will match words of 1 or more letters. The (?<!\1)
will apply to the end of the word (here, the last letter); for 1 letter words, the first and last letter are the same, so the pattern will always fail, which is desirable (so 'a' will never be matched). For words of 2 or more letters, it will compare letters in different positions, which is desirable. Thus \w\w*
works as a base pattern. Note that \w\w+
would also work.
Some engines place restrictions on lookbehinds, such as not allowing backreferences in them. In this case, the pattern could be re-written to use a lookahead, placed before the last letter, which thus must be separated out (as the first letter was):
\b(\w)\w*(?!\1)\w\b
(demo)
Again, this should be examined in terms of length (left as an exercise).
Finally, make sure to set any relevant flags, such as case-insensitivity.