3

Could you please help me in writing a pure regex expression to find the first letter that doesn't repeat in a string? I thought I might possibly need to use negative lookahead and negative lookbehind, but I don't think javascript supports lookbehind.

For e.g.

'NUNUMUNN'        // expected output 'M'
'LOMING'          // expected output 'L'

I think it's possible to do this using general string operations, but my preference is really for pure regex.

My starting point is currently this:

/(a-zA-Z).*?(?!\1)/.match('thestring');

But it doesn't work.

AKS
  • 18,983
  • 3
  • 43
  • 54
  • Check this thread [http://stackoverflow.com/questions/2285533/find-the-first-un-repeated-character-in-a-string](http://stackoverflow.com/questions/2285533/find-the-first-un-repeated-character-in-a-string) – Thirumal May 11 '16 at 04:56

1 Answers1

3

Turn your logic around: First match all the letters in a word that do repeat, then match the next letter - that's the one you need to look at. Then there are some edge cases to consider.

/\b(?:(?:([a-z])(?=[a-z]*\1))+(?!\1))?([a-z])(?![a-z]*\2)/ig

Explanation:

\b          # Start of word
(?:         # Start of non-capturing group (optional, see below)
 (?:        # Start of non-capturing group that matches...
  ([a-z])   # ...and captures any ASCII letter
  (?=       # if it's followed by
   [a-z]*   #  zero or more letters 
   \1       #  and the same letter again.
  )         # (end of lookahead assertion)
 )+         # Repeat at least once
 (?!\1)     # Assert that this letter doesn't follow immediately to avoid matching "AAA"
)?          # Make that group optional (in case there are no repeated letters in the word)
([a-z])     # Then match a letter and capture it in group 2.
(?![a-z]\2) # and make sure that letter doesn't immediately repeat either.

Note that you will need to look at group 2 of a match in order to get the result - group 1 will contain whatever comes before the first non-repeating letter.

Test it live on regex101.com.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Hmm. When testing this on https://regex101.com/r/dR3jQ6/1, it works with the default PCRE flavor but fails in the JavaScript tag. Same thing with RegexBuddy; apparently JS treats the empty backreference within a lookahead assertion differently...need to look at this a bit more... – Tim Pietzcker May 11 '16 at 05:17
  • This does not seem to work for me. `"ZAZ".match(r)` is giving me `["ZA"]`, what am I missing? –  May 11 '16 at 05:20
  • @torazaburo: The regex matches all the repeating letters (`A` in this case) and then the first non-repeating letter (`Z`). Look at group 2 for the actual result. However, there are other issues with this regex that I'm addressing at the moment - will post an edit soon. – Tim Pietzcker May 11 '16 at 05:36
  • Oh my god, I could never have come up with this! Thank you so much! ^^ Is there any place you'd recommend for me to go to for learning regex at this level? – apriltran95 May 11 '16 at 15:03
  • 1
    edit: It doesn't seem to return 'A' from the string 'XXXXXA', and it does return 'A' from the string 'AXXXXA'? – apriltran95 May 11 '16 at 15:56
  • @apriltran95: Right...it also fails on "BXXAB" or "BXXBA". The problem is that the regex trips up if the first unrepeated character comes after a character that has just been repeated for the last time (i. e., it should be excluded but not because there is a repetition *ahead* but because there is one *behind* it). The problem with JavaScript regexes is that they don't support lookbehind assertions, so I currently don't see a way to reliably handle this edge case. Many modern regex engines are kind of context-aware and could handle this, but not JavaScript... – Tim Pietzcker May 11 '16 at 20:04