7

Using regular expression, I want to select only the words which:

  • are alphanumeric
  • do not contain only numbers
  • do not contain only alphabets
  • have unique numbers(1 or more)

I am not really good with the regex but so far, I have tried [^\d\s]*(\d+)(?!.*\1) which takes me nowhere close to the desired output :(

Here are the input strings:

I would like abc123 to match but not 123.
ab12s should also match
Only number-words like 1234 should not match
Words containing same numbers like ab22s should not match
234 should not match
hel1lo2haha3hoho4
hel1lo2haha3hoho3

Expected Matches:

abc123
ab12s
hel1lo2haha3hoho4
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
ManJoey
  • 213
  • 2
  • 7

4 Answers4

8

You can use

\b(?=\d*[a-z])(?=[a-z]*\d)(?:[a-z]|(\d)(?!\w*\1))+\b

https://regex101.com/r/TimjdW/3

Anchor the start and end of the pattern at word boundaries with \b, then:

  • (?=\d*[a-z]) - Lookahead for an alphabetical character somewhere in the word
  • (?=[a-z]*\d) - Lookahead for a digit somewhere in the word
  • (?:[a-z]|(\d)(?!\w*\1))+ Repeatedly match either:
    • [a-z] - Any alphabetical character, or
    • (\d)(?!\w*\1) - A digit which does not occur again in the same word
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
3

Here is a bit shorter & faster regex to make it happen since it doesn't assert negative lookahead for each character:

/\b(?=[a-z]*\d)(?=\d*[a-z])(?!\w*(\d)\w*\1)[a-z\d]+\b/ig

RegEx Demo

RegEx Details:

  • \b: Word boundary
  • (?=[a-z]*\d): Make sure we have at least a digit
  • (?=\d*[a-z]): Make sure we have at least a letter
  • (?!\w*(\d)\w*\1): Make sure digits are not repeated anywhere in the word
  • [a-z\d]+: Match 1+ alphanumericals
  • \b: Word boundary
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • @ManJoey: I believe the that selected regex will be slower for bigger strings – anubhava Feb 02 '19 at 07:37
  • @anubhava, `1` in `(?!\w*(\d)\w*\1)` means `(\d)` saved in memory's first buffer(like `sed`) kind of thing, sorry I am learning regex so thought to check with you on same once. – RavinderSingh13 Feb 02 '19 at 07:39
  • 1
    Yes that's correct. `\1` is back-reference for captured group #1 i.e. `(\d)` – anubhava Feb 02 '19 at 07:41
2

You could assert all the conditions using one negative lookahead:

\b(?![a-z]+\b|\d+\b|\w*(\d)\w*\1)[a-z\d]+\b

See live demo here

The important parts are starting match from \b and immediately looking for the conditions:

  • [a-z]+\b Only alphabetic

  • \d+\b Only numeric

  • \w*(\d)\w*\1 Has a repeating digit

revo
  • 47,783
  • 14
  • 74
  • 117
  • Thanks, good fix though not sure why one single negative lookahead with alternations shows more steps (as in my answer) than multiple assertions on regex101 – anubhava Feb 02 '19 at 07:07
  • 1
    @anubhava Yes, it depends on the input string. For example consider switching the first two lookaheads in your solution. You'll see [it increases](https://regex101.com/r/avSYuv/2). – revo Feb 02 '19 at 07:10
1

You can use this

\b(?!\w*(\d)\w*\1)(?=(?:[a-z]+\d+)|(?:\d+[a-z]+))[a-z0-9]+\b
  • \b - Word boundary.
  • (?!\w*(\d)\w*\1) - Condition to check unique digits.
  • (?=(?:[a-z]+\d+)|(?:\d+[a-z]+)) - Condition to check alphanumeric words.
  • [a-z0-9]+ - Matches a to z and 0 to 9

Demo

Code Maniac
  • 37,143
  • 5
  • 39
  • 60
  • Looking at all the answers here reminds me why I think regexes aren't the best way to solve this sort of problem…  They're highly ingenious, but I'd hate to have to debug or maintain any of them! – gidds Feb 02 '19 at 08:36
  • 1
    @gidds You can approach them the same way you approach *any* programming problem - break the problem down into logical groups, (re?)write and verify each group, and put it together into a single pattern. REs are a great concise way to match strings - they're extremely flexible and mostly language agnostic, which is a huge plus. As long as the pattern to debug has descriptive comments (like in the answers here), it shouldn't be hard at all for someone with a bit of experience with REs, IMO – CertainPerformance Feb 03 '19 at 21:02