17

I have been attempting to validate a string in VB.net that must contain these three letters in no particular order and do not need to be next to One another. ABC

I can do this easily using LINQ

MessageBox.Show(("ABC").All(Function(n) ("AAAABBBBBCCCC").Contains(n)).ToString)

However, after searching Google and SO for over a week, I am completely stumped. My closest pattern is ".*[A|B|C]+.*[A|B|C]+.*[A|B|C]+.*" how ever AAA would also return true. I know i can do this using other methods just after trying for a week i really want to know if its possible using One regular expression.

p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
Sam Johnson
  • 307
  • 1
  • 2
  • 8
  • 1
    Is there a reason you need Regex? And is it always just a set of 3 characters? – user7116 Sep 05 '13 at 18:11
  • I am trying to learn how to use regex. In order to learn One must set tasks for his/her self. Thanks for taking the time to reply :) – Sam Johnson Sep 05 '13 at 18:13
  • You might try a different problem for learning regex. While it's possible to solve this one, as Jerry has shown, it's not a particularly good problem for a regular expression (many string libraries even offer a `ContainsAll` method). – ssube Sep 05 '13 at 18:31

4 Answers4

16

Your original pattern won't work because it will match any number of characters, followed by one or more A, B, C, or | character, followed by any number of characters, followed by one or more A, B, C, or | character, followed by any number of characters, followed by one or more A, B, C, or | character, followed by any number of characters.

I'd probably go with the code you've already written, but if you really want to use a regular expression, you can use a series of lookahead assertions, like this:

(?=.*A)(?=.*B)(?=.*C)

This will match any string that contains A, B, and C in any order.

p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
  • Thank you for taking the time to explain my pattern and providing an example. – Sam Johnson Sep 05 '13 at 18:23
  • 2
    You should mention the lookaheads are zero-width matches and should be followed by a pattern to match against (eg a `.+` following the lookaheads): `(?=.*A)(?=.*B)(?=.*C).+` – bobobobo Jan 05 '18 at 18:04
  • 1
    @bobobobo Not necessarily. A zero-length string is a valid pattern, so if you're just testing for a match (and you don't care to actually capture a match), then just having a zero-width assertion alone works fine. See https://repl.it/repls/BumpyFoolhardyBarnswallow – p.s.w.g Jan 05 '18 at 19:23
6

You can make use of positive lookaheads:

^(?=.*A)(?=.*B)(?=.*C).+

(?=.*A) makes sure there's an A somewhere in the string and the same logic applies to the other lookaheads.

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • Oh my. How close i was not. I understand the grouping but would you mind explaining (?= Is that the positive look a head? – Sam Johnson Sep 05 '13 at 18:16
  • @SamJohnson Yup! It's a bit of an 'advanced' regex syntax if you ask me. I took a while myself to understand them as I do now. Basically, you will get a match only if the condition within the brackets is matched. So, if the string matches `.*A` (i.e. any character, then an `A`, it will match). Applying 3 like that in a row is like checking 3 conditions and making sure they are satisfied before continuing further. This is a rough description of the lookahead, but you can get more information [here](http://www.regular-expressions.info/lookaround.html). – Jerry Sep 05 '13 at 18:20
  • Thank you for much for your time and effort. I accepted your answer. I really do appreciate it. – Sam Johnson Sep 05 '13 at 18:23
  • @SamJohnson Unfortunately you can 'accept' only one answer, that's how the site works ^^; But it's okay if you accept p.s.w.g's answer. I should probably have mentioned why your regex was not working as well. – Jerry Sep 05 '13 at 18:26
2

You can use zero-width lookaheads. Lookaheads are great to eliminate match possibilities if they don't meet a certain criteria.

For example, let's use the words

untie queue unique block unity

Start with a basic word match:

\b\w+\b

to require the word matched with \w+ begins with un, we could use a positive lookahead

\b(?=un)\w+\b

What this says is

  • \b Match a blank
  • (?=un) Are there the letters "un"? If not, NO MATCH. If so, then possible match.
  • \w+ One or more word characters
  • \b Match a blank

A positive lookahead eliminates a match possibility if it does NOT meet the expression inside. It applies to the regex RIGHT AFTER it. So the (?=un) applies to the \w+ expression above and requires that it BEGINS WITH un. If it does not, then the \w+ expression won't match.

How about matching any words that do not begin with un? Simply use a "negative lookahead"

\b(?!un)\w+\b
  • \b Match a blank
  • (?!un) Are there the letters "un"? If SO, NO MATCH. If not, then possible match.
  • \w+ One or more word characters
  • \b Match a blank

So for your requirement of having at least 1 A, 1 B and 1 C in the string, a pattern like

(?=.*A)(?=.*B)(?=.*C).+

Works because it says:

  • (?=.*A) - Does it have .* any characters followed by A? If so, possible match if not no match.
  • (?=.*B) - Does it have .* any characters followed by B? If so, possible match if not no match.
  • (?=.*C) - Does it have .* any characters followed by C? If so, possible match if not no match.
  • .+ If the above 3 lookahead requirements were met, match any characters. If not, then match no characters (and so there isn't a match)
bobobobo
  • 64,917
  • 62
  • 258
  • 363
0

Does it have to be a regex? That's something that can easily be solved without one.

I've never programmed in VB, but I'm sure there are helper functions that let you take a string, and query whether or not a character occurs in it.

If str is your string, maybe something like:

str.contains('A') && str.contains('B') && str.contains('C')

rix0r
  • 1
  • 3
    "Does it have to be a regex? That's something that can easily be solved without one." Didn't i already not provide a working example not using regex? – Sam Johnson Sep 05 '13 at 18:24