5

Is it possible to construct a PCRE-style regular expression that will only match each letter in a list only once?

For example, if you have the letters "lrsa" and you try matching a word list against:

^[lrsa]*m[lrsa]*$

you're going to match "lams" (valid), but also "lamas" (invalid for our purposes because you only had one "a"). If your letter set was "lrsaa", you would want to match "lamas".

Is this possible with regular expressions, or should I handle it programmatically?

gtcaz
  • 289
  • 1
  • 5
  • 9
  • This won’t work as `[lrsaa]` is equal to `[lrsa]`. – Gumbo Apr 13 '10 at 16:53
  • 1
    Right, and that's my issue. You can limit with [lrsa]{4} but that will still match "lass", for example. – gtcaz Apr 13 '10 at 16:55
  • What you can do is match both the ones you want and some extras you don't. With an iteration of your matches, it would be trivial to filter out the unwanted extras. – erisco Apr 13 '10 at 17:00
  • Do you mean "lmsa", rather than "lrsa"? Otherwise, it wouldn't match "lams". – James Kolpack Apr 13 '10 at 17:19
  • In my example above I was matching against ^[lrsa]*m[lrsa]*$ (note the "m"). Think Scrabble where you have a rack of letters you need to play off an existing letter. – gtcaz Apr 13 '10 at 17:41
  • By the way, here's a handy regex cheat sheet by Alexader Stigson (of e-texteditor fame): http://opencompany.org/download/regex-cheatsheet.pdf (pdf). E is really handy for testing regex because it shows you live results of your matches. – gtcaz Apr 13 '10 at 18:32

1 Answers1

5

You can use negative look-ahead:

^(?!.*?(.).*?\1)[lrsa]*m[lrsa]*$

will do what you want

ZyX
  • 52,536
  • 7
  • 114
  • 135
  • 1
    Yes, that does work where each letter is unique. Very helpful. (I need to sort through that and figure how it works. Reading this as well: http://stackoverflow.com/questions/1749437/regular-expression-negative-lookahead) What about where there are more than one occurrance of a letter, e.g.: "abbcde" and you want to match on "babe" but not "dade"? Possible? – gtcaz Apr 13 '10 at 17:49
  • I am not sure that I understood you correctly, but maybe this will do the trick: `^(?!.*?(d).*?\1)\w+$` – ZyX Apr 14 '10 at 02:45
  • I tried this with ```grep -P``` and got the error " grep: unrecognized character after (? or (?- ". Is there a version of the solution that will work with grep? – ianinini Nov 09 '19 at 12:59
  • @ianinini This regex works with my grep just fine. I have grep-3.1 and libpcre-8.42. – ZyX Nov 09 '19 at 16:27
  • @ZyX - you are correct. At least when I use single quotes it works on my system. If I use double quotes something goes horribly wrong. (I had fun getting there... built pcre2 from source and still hit the same errors.) – ianinini Nov 09 '19 at 17:31