1

I know how to match a string having the same first and last character using:

/^(.).*\1$/

I want the opposite requirement for words: a regex to match words where the first and last letter are different. For example, abc should match since 'a' and 'c' are different, and bgb should fail since it begins and ends with 'b'.

I tried with /^(.).*(?!\1)$/, but it had both false positives and negatives (it matched when it shouldn't, and didn't match when it should).

What regex would match words where the first and last letters are different?

outis
  • 75,655
  • 22
  • 151
  • 221
  • If you don't want to match the beginning and end of a string, removing the `^` (match-beginning-of-string) and `$` (match-end-of-string) characters would be a good start. https://regex101.com/ is a great tool for understanding regex. Also probably good to pick one language tag or explain how this question is both for JS & python regex. – CollinD Oct 09 '22 at 04:16
  • i think he is saying the word can not end with the same letter it startswith ... – Joran Beasley Oct 09 '22 at 04:17
  • any reason this is tagged both js and pyth? it demonstrates usage of neither code. – rv.kvetch Oct 09 '22 at 04:18
  • @rv.kvetch it looks like those regex are js style regex ... but your point stands imho – Joran Beasley Oct 09 '22 at 04:20
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – user11717481 Oct 09 '22 at 04:37
  • @ellhe-blaster Actually the OP's requirement is very clear. Not exactly clear which language is being used. – Tim Biegeleisen Oct 09 '22 at 04:43

2 Answers2

3

A negative lookahead as you were attempting can be made to work:

^(.).*(?!\1).$

This pattern says to match:

  • ^ from the start of the string
  • (.) match any first character and capture it in \1
  • .* match zero or more additional characters
  • (?!\1) assert that the last character is NOT \1
  • . match any last character (which cannot be \1)
  • $ end of the string

Here is a running demo.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

Firstly, we need a regex to match words. The simplest is to use the word metacharacter class, \w. Note that this will match not only letters but digits and the underscore; if it's not the appropriate character class, you'll need to substitute something else (such as the Unicode letter category \P{L}).

Next, a regex that will match a full word, which is fairly straight forward. Simply match a sequence of word-characters, anchored by word boundaries:

\b\w+\b

Next, capture the first letter:

\b(\w)\w*\b

Finally, use a negative lookahead to negate the backreference. With some engines, you can do this with a lookbehind (demo):

\b(\w)\w*(?<!\1)\b

When working with multiple-characters patterns that break down parts into sub-patterns, it's important to consider the qualifiers. Note that \w\w* is equivalent to \w+, and so will match words of 1 or more letters. The (?<!\1) will apply to the end of the word (here, the last letter); for 1 letter words, the first and last letter are the same, so the pattern will always fail, which is desirable (so 'a' will never be matched). For words of 2 or more letters, it will compare letters in different positions, which is desirable. Thus \w\w* works as a base pattern. Note that \w\w+ would also work.

Some engines place restrictions on lookbehinds, such as not allowing backreferences in them. In this case, the pattern could be re-written to use a lookahead, placed before the last letter, which thus must be separated out (as the first letter was):

\b(\w)\w*(?!\1)\w\b

(demo)

Again, this should be examined in terms of length (left as an exercise).

Finally, make sure to set any relevant flags, such as case-insensitivity.

outis
  • 75,655
  • 22
  • 151
  • 221
  • Your second pattern is basically identical to my answer. – Tim Biegeleisen Oct 09 '22 at 08:03
  • @TimBiegeleisen: the regex is notably different, due to the use of character classes instead of the "any" metacharacter and different anchors. Their behavior on sentences (e.g. "Socrates says to match some words.") illustrates the difference. Moreover, this answer has a different structure, showing the steps to derive the pattern rather than starting with the pattern and explaining the parts. What are you getting at with your comment? – outis Oct 09 '22 at 08:22