2

Is there a regular expression to find two different words in a sentence? Extra credit for an expression that works in MS Visual Studio 2008 :)

For example:

reg_ex_match(A, B, "A sentence with A and B") = true
reg_ex_match(C, D, "A sentence with A and B") = false

See also this related question

Community
  • 1
  • 1
Thomas Bratt
  • 48,038
  • 36
  • 121
  • 139
  • 1
    Try giving a complete example of what you want to happen? Is it OR or AND you require? What range of characters are allowed in A, B, C and D? – AnthonyWJones Feb 06 '09 at 13:56
  • What exactly do you mean by “word”? A sequence that is either delimited by space characters or at the begin or the end of the string? – Gumbo Feb 06 '09 at 16:47

7 Answers7

10

For real words:

\bA\b.+\bB\b|\bB\b.+\bA\b
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • I guess it depends on what is meant by "words" by the OP. And, the second half of your expression is double B's. – alphadogg Feb 06 '09 at 14:02
  • Also, note that a word boundary may not be what you want. To the OP, is "A-B" one word or two every time? Ex: a last name is sometimes hyphenated. – alphadogg Feb 06 '09 at 14:03
3

".*A.*B.*|.*B.*A.*" You can add spaces around the words A and B if you want proper words.

Łukasz Lew
  • 48,526
  • 41
  • 139
  • 208
  • Careful. This would match "The sentence with AB". Close, though. – alphadogg Feb 06 '09 at 13:38
  • Which would be a proper behaviour If you define a word as a separate word, then as I said you should add spaces around. – Łukasz Lew Feb 06 '09 at 13:42
  • Spaces won't cut it, because a word might be at the beginning or the end of the String. In that case it should still be considered a separate word, but hasn't got a space before/after it. See @Gumbos solution using \b for the "real" solution. – Joachim Sauer Feb 06 '09 at 13:49
  • Be careful with word boundaries. I've seen lots of people get bitten by not realizing that some "words" they had in their dataset contained characters not in the boundary definition for whatever "flavor" of regex they were using. – alphadogg Feb 06 '09 at 14:09
  • This would also match AUTOBAHN or BAILOUT since the .* will also match word characters that surround or are in between the A and B (or B and A). It would even match something like "And always be sure to look both ways BEFORE crossing the street." – Bryan Feb 06 '09 at 21:08
0

Regex

Following regex matches the entire string, only if the string contains all of the words: all your words here. You can easily add other words or remove existing ones.

(?=.*?\ball\b)
(?=.*?\byour\b)
(?=.*?\bwords\b)
(?=.*?\bhere\b)
.*

Not so complicated.

mmdemirbas
  • 9,060
  • 5
  • 45
  • 53
0

The regex expression you are looking for is something like this:

/word1.*(?=word2)|word2.*(?=word1)/igm

This is also case insenitive and can be applied to text that is multiline.

Tested over at http://regexr.com/

Joshua Pinter
  • 45,245
  • 23
  • 243
  • 245
0

Why not use boolean logic, rather than a complicated regex?

Code not tested:

public bool reg_ex_match(Regex A, Regex B, string s) {
    return A.isMatch(s) && B.isMatch(s);
}

Update: This assumes A and B are defined with word boundaries:

Regex A = new Regex(@"\bA\b");
toolkit
  • 49,809
  • 17
  • 109
  • 135
0

.*A.*\s.*B.*|.*B.*\s.*A.*

Please note the use of the '+' between A and B. This is to ensure you match on separate A and B. If this is not a requirement, then Łukasz Lew's answer is correct.

UPDATE: Changed as per Bryan's excellent observation below. The above expression will recognize A separated from B (or vice versa) with at least one whitespace character (space, tab or line break) between the two regions of interest.

alphadogg
  • 12,762
  • 9
  • 54
  • 88
-1

Try searching regexlib, a regular expression repository.

Seki
  • 11,135
  • 7
  • 46
  • 70
Kon
  • 27,113
  • 11
  • 60
  • 86
  • 1
    This is bad advice anyway. The quality of regexes on that site is all over the place, and peer review is almost non-existent. – Alan Moore Apr 12 '14 at 23:00