0
  1. Given n arbitrary strings, s1, ..., sn, how can I specify a RegEx that can match all the strings? For example,

    I am not smart.

    If I want to match am and ar, I can use a[mr].

    But if I want to match am and not, I don't know how, because brackets can only specify a set of characters, not a set of strings.

  2. Another example, How can I match both a* and b*? Are there any particular way for this particular example?

Thanks.

Tim
  • 1
  • 141
  • 372
  • 590
  • I'm not clear on what you are trying to do... especially with #2, you want to match 0 or more a's and b's? – Smern Jun 07 '14 at 03:38
  • possible duplicate of [Reference - What does this regex mean?](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – John Dvorak Jun 10 '14 at 03:14

2 Answers2

4

You can use a capturing or non-capturing group to separate your expressions with the alternation operator.

(am|not)    # group and capture to \1: 'am' OR 'not'
(?:am|not)  # group, but do not capture: 'am' OR 'not'

To match a or b followed by the * quantifier meaning (0 or more times) ...

(a|b)*      # group and capture to \1 (0 or more times): 'a' OR 'b'
(?:a|b)*    # group, but do not capture (0 or more times): 'a' OR 'b'

Or using a character class:

([ab]*)     # group and capture to \1: any character of: 'a', 'b' (0 or more times)
hwnd
  • 69,796
  • 4
  • 95
  • 132
2

If you want to match a specific and defined set of words, you can have them fully typed out and separated with the OR operator, |:

(am|not|smart)

Pending the language you're using, you'll need to specify different flags to capture them individually, but "all" of them. In javascript, for instance, you would use g:

str.match(/(am|not|smart)/g);

Whereas in PHP you would simply use the preg_match_all() function:

preg_match_all('/(am|not|smart)/', $str, $matches);

If you're looking to match "all words", i.e. "any word", you can use the word-boundary \b:

\b([a-zA-Z]+)\b

This, of course, can be modified to accept punctuation or numeric values as well.

Regarding your second question, you hinted at the ability to do it in the first with a matching-set (i.e. characters within brackets). To capture any a or b character followed by anything else:

([ab].*)

If you want them to have to be followed by other letters (which can be expanded from here):

([ab][a-z]+)
newfurniturey
  • 37,556
  • 9
  • 94
  • 102