299

I need logical AND in regex.

something like

jack AND james

agree with following strings

  • 'hi jack here is james'

  • 'hi james here is jack'

codaddict
  • 445,704
  • 82
  • 492
  • 529
Meloun
  • 13,601
  • 17
  • 64
  • 93
  • 1
    Possible duplicate: [mulitple-words-in-any-order-using-regex](http://stackoverflow.com/questions/1177081/mulitple-words-in-any-order-using-regex) – Anderson Green Jun 03 '13 at 04:44
  • @AndersonGreen, the question was prematurely locked. The answers are severely lacking as those solutions are not viable since most regex don't recognize **lookaround** and **mode quantifier**. I believe **quantifier** existed at the point of the question being asked. – XPMai Jun 01 '20 at 10:51

9 Answers9

381

You can do checks using positive lookaheads. Here is a summary from the indispensable regular-expressions.info:

Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions...lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not.

It then goes on to explain that positive lookaheads are used to assert that what follows matches a certain expression without taking up characters in that matching expression.

So here is an expression using two subsequent postive lookaheads to assert that the phrase matches jack and james in either order:

^(?=.*\bjack\b)(?=.*\bjames\b).*$

Test it.

The expressions in parentheses starting with ?= are the positive lookaheads. I'll break down the pattern:

  1. ^ asserts the start of the expression to be matched.
  2. (?=.*\bjack\b) is the first positive lookahead saying that what follows must match .*\bjack\b.
  3. .* means any character zero or more times.
  4. \b means any word boundary (white space, start of expression, end of expression, etc.).
  5. jack is literally those four characters in a row (the same for james in the next positive lookahead).
  6. $ asserts the end of the expression to me matched.

So the first lookahead says "what follows (and is not itself a lookahead or lookbehind) must be an expression that starts with zero or more of any characters followed by a word boundary and then jack and another word boundary," and the second look ahead says "what follows must be an expression that starts with zero or more of any characters followed by a word boundary and then james and another word boundary." After the two lookaheads is .* which simply matches any characters zero or more times and $ which matches the end of the expression.

"start with anything then jack or james then end with anything" satisfies the first lookahead because there are a number of characters then the word jack, and it satisfies the second lookahead because there are a number of characters (which just so happens to include jack, but that is not necessary to satisfy the second lookahead) then the word james. Neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

I think you get the idea, but just to be absolutely clear, here is with jack and james reversed, i.e. "start with anything then james or jack then end with anything"; it satisfies the first lookahead because there are a number of characters then the word james, and it satisfies the second lookahead because there are a number of characters (which just so happens to include james, but that is not necessary to satisfy the second lookahead) then the word jack. As before, neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

This approach has the advantage that you can easily specify multiple conditions.

^(?=.*\bjack\b)(?=.*\bjames\b)(?=.*\bjason\b)(?=.*\bjules\b).*$
Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
Alin Purcaru
  • 43,655
  • 12
  • 77
  • 90
  • 5
    `vim` syntax: `^\(.*\\)\@=\(.*\\@=\).*$` or `\v^(.*)@=(.*)@=.*$` – mykhal Aug 26 '14 at 15:58
  • Does anyone know why this would break (in JavaScript at least) when I try to search for strings starting with '#'? `^(?=.*\b#friday\b)(?=.*\b#tgif\b).*$` fails to match `blah #tgif blah #friday blah` but `^(?=.*\bfriday\b)(?=.*\btgif\b).*$` works fine. – btleffler Aug 24 '15 at 18:27
  • This isn't working for me, as demoed here: https://regex101.com/r/xI9qT0/1 – TonyH Aug 17 '16 at 17:38
  • @TonyH, for JavaScript you can remove the last `$` symbol from the pattern or remove the new line character from the test string, other languages (Python, PHP) on this website work perfectly. Also you can remove `.*$` from the end — regexp still will be matches the test string, but it's without selecting of the whole test string as match. – kupgov Dec 13 '16 at 18:28
  • Adding `(?i)` can also make it case insensitive. `^(?i)(?=.*\bjack\b)(?=.*\bjames\b).*$` – JStevens Jul 07 '17 at 12:32
  • Depending on the use case (printing the line containing the match, or perhaps just knowing if there is a match), this simpler one may suffice: `^(?=[\s\S]*jack)(?=[\s\S]*james)` – flow2k Mar 03 '19 at 07:04
171

Try:

james.*jack

If you want both at the same time, then or them:

james.*jack|jack.*james
icyrock.com
  • 27,952
  • 4
  • 66
  • 85
  • 1
    The accepted answer worked. this also worked perfectly for me. For searching code in visual studio 'find results'. – Yogurt The Wise May 25 '16 at 13:02
  • 7
    This one works for me and is much more concise & easy to understand than the accepted answer! – Kumar Manish Sep 26 '17 at 09:20
  • 2
    I needed a solution that only had two names to match, so this answer is more concise for that case. But the accepted answer becomes more concise beyond 2 since the number of "or"s increases factorially. For 3 names there would be 6 "or"s, 4 names would be 24 "or"s, etc. – WileCau Oct 24 '18 at 00:46
  • 2
    I would recommend to make it lazy `james.*?jack|jack.*?james`. This will help on large texts. – Jekis Jun 03 '19 at 10:38
  • 1
    Note this will also match such names as "jacky" and "jameson" – Gershom Maes Sep 04 '20 at 14:08
  • Only issue is you can't use proper capture grouping with this method without needing n² groups. – Nixinova Jan 12 '21 at 07:35
59

Explanation of command that i am going to write:-

. means any character, digit can come in place of .

* means zero or more occurrences of thing written just previous to it.

| means 'or'.

So,

james.*jack

would search james , then any number of character until jack comes.

Since you want either jack.*james or james.*jack

Hence Command:

jack.*james|james.*jack
Aryeh Beitz
  • 1,974
  • 1
  • 22
  • 23
Shubham Sharma
  • 1,753
  • 15
  • 24
  • 11
    As a side note - you could also have edited @icyrock's answer (which is the same as yours, just 6 years earlier), your explanation is very useful on its own. – WoJ Jan 23 '18 at 14:24
  • 2
    Thank you for this answer, i however feel the need to point out that in VSCode search, your answer **jack.*james | james.*jack** will take the spaces between the '|' (or) symbol into consideration during the search. **jack.*james|james.*jack** works and doesnt look for the spaces – jgritten Jun 15 '18 at 17:29
  • 1
    Don't you need 2000 rep for the edit privilege? – Chris Strickland Nov 02 '21 at 23:05
37

Its short and sweet

(?=.*jack)(?=.*james)

Test Cases:

[
  "xxx james xxx jack xxx",
  "jack xxx james ",
  "jack xxx jam ",
  "  jam and jack",
  "jack",
  "james",
]
.forEach(s => console.log(/(?=.*james)(?=.*jack)/.test(s)) )
vsync
  • 118,978
  • 58
  • 307
  • 400
Shivam Agrawal
  • 801
  • 9
  • 5
  • could you say how it works? lookahead needs word before, and there is nothing. in this case `element (?=.*jack)` result will be `element`, for `(?=.*jack)` there will be no result . Olso tried on example string here: https://regex101.com – sygneto Nov 28 '20 at 15:30
9

You can do:

\bjack\b.*\bjames\b|\bjames\b.*\bjack\b
codaddict
  • 445,704
  • 82
  • 492
  • 529
7

The expression in this answer does that for one jack and one james in any order.

Here, we'd explore other scenarios.

METHOD 1: One jack and One james

Just in case, two jack or two james would not be allowed, only one jack and one james would be valid, we can likely design an expression similar to:

^(?!.*\bjack\b.*\bjack\b)(?!.*\bjames\b.*\bjames\b)(?=.*\bjames\b)(?=.*\bjack\b).*$

Here, we would exclude those instances using these statements:

(?!.*\bjack\b.*\bjack\b)

and,

(?!.*\bjames\b.*\bjames\b)

RegEx Demo 1

We can also simplify that to:

^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b|.*\bjack\b).*$

RegEx Demo 2


If you wish to simplify/update/explore the expression, it's been explained on the top right panel of regex101.com. You can watch the matching steps or modify them in this debugger link, if you'd be interested. The debugger demonstrates that how a RegEx engine might step by step consume some sample input strings and would perform the matching process.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Test

const regex = /^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b|.*\bjack\b).*$/gm;
const str = `hi jack here is james
hi james here is jack
hi james jack here is jack james
hi jack james here is james jack
hi jack jack here is jack james
hi james james here is james jack
hi jack jack jack here is james
`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

METHOD 2: One jack and One james in a specific order

The expression can be also designed for first a james then a jack, similar to the following one:

^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjames\b.*\bjack\b).*$

RegEx Demo 3

and vice versa:

^(?!.*\bjack\b.*\bjack\b|.*\bjames\b.*\bjames\b)(?=.*\bjack\b.*\bjames\b).*$

RegEx Demo 4

Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    Great explanation. It would be even better if your method 1 could match both 'james' AND 'jack' in any order. Testing it, I found that your regex expression matches single 'james' or 'jack' – Kfcaio Aug 16 '20 at 11:06
7

No need for two lookaheads, one substring can be normally matched.

^(?=.*?\bjack\b).*?\bjames\b.*

See this demo at regex101

Lookarounds are zero-length assertions (conditions). The lookahead here checks at ^ start if jack occurs later in the string and on success matches up to james and .* the rest (could be removed). Lazy dot is used before words (enclosed in \b word boundaries). Use the i-flag for ignoring case.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • 1
    Very Good answer, thanks for sharing. One question: do we need `.*` after last `\b` or will that work without it also? – RavinderSingh13 Oct 24 '22 at 12:39
  • 1
    @RavinderSingh13 Thank you for your comment, good point! For just validating the `.*` in the end is indeed useless, it's just needed if the full match is wanted. – bobble bubble Oct 24 '22 at 13:15
5

Vim has a branch operator \& that is useful when searching for a line containing a set of words, in any order. Moreover, extending the set of required words is trivial.

For example,

/.*jack\&.*james

will match a line containing jack and james, in any order.

See this answer for more information on usage. I am not aware of any other regex flavor that implements branching; the operator is not even documented on the Regular Expression wikipedia entry.

Firstrock
  • 931
  • 8
  • 5
5

You can make use of regex's quantifier feature since lookaround may not be supported all the time.

(\bjames\b){1,}.*(\bjack\b){1,}|(\bjack\b){1,}.*(\bjames\b){1,}
XPMai
  • 149
  • 1
  • 6
  • Why no one tries this, 0 voted answers might be the best, thanks mate. – captain_majid Aug 19 '20 at 23:00
  • @captain_majid, I apologize. After intense research and based on false positives data, I realized my original answer was wrong. I've fixed the regex code. This correct regex will work perfectly as expected. – XPMai Aug 28 '20 at 11:20
  • Your 1st example worked fine with me, and strangely even a simpler one like that worked also: `\b(word1|word2|word3|word4|etc)\b` I've tested it here: https://rubular.com/r/Pgn2d6dXXXHoh7 – captain_majid Aug 31 '20 at 13:01