82

How do I write a regular expression to match two given strings, at any position in the string?

For example, if I am searching for cat and mat, it should match:

The cat slept on the mat in front of the fire.
At 5:00 pm, I found the cat scratching the wool off the mat.

No matter what precedes these strings.

Sam
  • 7,252
  • 16
  • 46
  • 65
Phani K
  • 1,133
  • 2
  • 11
  • 14

7 Answers7

94
/^.*?\bcat\b.*?\bmat\b.*?$/m

Using the m modifier (which ensures the beginning/end metacharacters match on line breaks rather than at the very beginning and end of the string):

  • ^ matches the line beginning
  • .*? matches anything on the line before...
  • \b matches a word boundary the first occurrence of a word boundary (as @codaddict discussed)
  • then the string cat and another word boundary; note that underscores are treated as "word" characters, so _cat_ would not match*;
  • .*?: any characters before...
  • boundary, mat, boundary
  • .*?: any remaining characters before...
  • $: the end of the line.

It's important to use \b to ensure the specified words aren't part of longer words, and it's important to use non-greedy wildcards (.*?) versus greedy (.*) because the latter would fail on strings like "There is a cat on top of the mat which is under the cat." (It would match the last occurrence of "cat" rather than the first.)

* If you want to be able to match _cat_, you can use:

/^.*?(?:\b|_)cat(?:\b|_).*?(?:\b|_)mat(?:\b|_).*?$/m

which matches either underscores or word boundaries around the specified words. (?:) indicates a non-capturing group, which can help with performance or avoid conflicted captures.

Edit: A question was raised in the comments about whether the solution would work for phrases rather than just words. The answer is, absolutely yes. The following would match "A line which includes both the first phrase and the second phrase":

/^.*?(?:\b|_)first phrase here(?:\b|_).*?(?:\b|_)second phrase here(?:\b|_).*?$/m

Edit 2: If order doesn't matter you can use:

/^.*?(?:\b|_)(first(?:\b|_).*?(?:\b|_)second|second(?:\b|_).*?(?:\b|_)first)(?:\b|_).*?$/m

And if performance is really an issue here, it's possible lookaround (if your regex engine supports it) might (but probably won't) perform better than the above, but I'll leave both the arguably more complex lookaround version and performance testing as an exercise to the questioner/reader.

Edited per @Alan Moore's comment. I didn't have a chance to test it, but I'll take your word for it.

eyelidlessness
  • 62,413
  • 11
  • 90
  • 94
  • 4
    `[\b]` matches a **backspace**, not a word boundary; `\b` takes on a different meaning inside a character class. – Alan Moore Feb 08 '10 at 07:34
  • 1
    @Alan Moore, thanks for the tip. The way things change meaning in a character class always throws me for a loop. – eyelidlessness Feb 08 '10 at 15:50
  • 1
    For matching any characters before, `.*` seems like it should be preferred to `.*?` which prefers matching as few characters as possible, and might unnecessarily limit the amount of matches. – Hashim Aziz Sep 13 '18 at 00:54
  • Wow, so many great usernames in this one thread. @AlanMoore I'm a huge fan of your work. – Travis Heeter Dec 04 '20 at 12:14
32
(.* word1.* word2.* )|(.* word2.* word1.*)
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Johan
  • 499
  • 3
  • 3
  • -1: incorrectly matches "a catastrophic mattress", fails on "cat on the mat" and doesn't observe word order (although that was specified only in the comments). – Tim Pietzcker Feb 08 '10 at 07:27
  • add the word boundaries so there are not substring matches – rxgx Jul 03 '13 at 00:46
  • 17
    This may not match what OP wanted but it helped me find two strings in a URL so +1 – PW Kad Sep 29 '14 at 14:57
  • 1
    And if the order is always the same and you don't care about spaces surrounding the words, you can use `(.*word1.*word2.*)` Simple and sweet and works very well for tricky Find Alls in things like Sublime Text, VSCode, etc that support REGEX searching. – Joshua Pinter Dec 15 '20 at 22:23
7

You can try:

\bcat\b.*\bmat\b

\b is an anchor and matches a word boundary. It will look for words cat and mat anywhere in the string with mat following cat. It will not match:

Therez caterpillar on the mat.

but will match

The cat slept on the mat in front of the fire

If you want to match strings which have letters cat followed by mat, you can try:

cat.*mat

This will match both the above example strings.

codaddict
  • 445,704
  • 82
  • 492
  • 529
  • Hmm.. not quite. It didn't match either string perfectly. It did match the "cat ... mat" pattern, but not the part before and after it. – Phani K Feb 08 '10 at 05:55
  • Oh ok..if he wants search cat and mat as *words*, you can add word boundary. Thanks Phanindra K. – codaddict Feb 08 '10 at 05:58
  • Thanks codaddict. I modified the regex to be something like this: .*?cat.*?mat.*? I hope that won't have any unwanted side effects. :) – Phani K Feb 08 '10 at 06:04
  • What if one of the words was not a "word" but a phrase (words with spaces in between)? Would this still work? – Phani K Feb 08 '10 at 06:08
  • Did you mean something like "ca t on a mat" ?? – codaddict Feb 08 '10 at 06:15
  • 2
    @Phanindra K: open up another question; as you are now describing a different problem than what you have written above. – mechanical_meat Feb 08 '10 at 06:15
  • 2
    @Adam Bernier and @Phanindra, an adequate solution to this question ought to be an adequate solution to the question raised in comments. No other post should be necessary. – eyelidlessness Feb 08 '10 at 07:16
7

If you absolutely need to only use one regex then

/(?=.*?(string1))(?=.*?(string2))/is

i modifier = case-insensitive

.*? Lazy evaluation for any character (matches as few as possible)

?= for Positive LookAhead it has to match somewhere

s modifier = .(period) also accepts line breaks

Kevin Johnson
  • 218
  • 1
  • 3
  • 10
5

This is fairly easy on processing power required:

(string1(.|\n)*string2)|(string2(.|\n)*string1)

I used this in visual studio 2013 to find all files that had both string 1 and 2 in it.

Michael Socha
  • 1,748
  • 1
  • 16
  • 17
  • 1
    Great! But how can I avoid whitespace and other non-word characters in between. I am trying to find all mix of whole words with the words 'student' and 'name'. So "studentpreferredname" and "student_name" are ok, but not "StudentID=@StudentID ORDER BY q.QualificationName" – Fandango68 Mar 21 '16 at 05:56
  • @Mike Socha III, how would you modify this so that it only finds strings that do NOT have *both* string1 and string2? – 208_man Dec 04 '20 at 22:33
1

you don't have to use regex. In your favourite language, split on spaces, go over the splitted words, check for cat and mat. eg in Python

>>> for line in open("file"):
...     g=0;f=0
...     s = line.split()
...     for item in s:
...         if item =="cat": f=1
...         if item =="mat": g=1
...     if (g,f)==(1,1): print "found: " ,line.rstrip()

found:  The cat slept on the mat in front of the fire.
found:  At 5:00 pm, I found the cat scratching the wool off the mat.
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • this also matches mat before cat, which is what the question asks, but may not be the intent :) – Jimmy Feb 08 '10 at 06:39
  • The problem that regex solves in this case is having quite a lot more flexibility over what to consider a boundary. Considering only spaces to be boundaries, this would fail (even though it seems the questioner's intent to have it match): `Beware the cat; it lays on the mat.` And while regex may be slower than the equivalent code to do this more safely, the equivalent code could require dozens of lines of code to do it right. There's a justified caution on SO about regex, but it really is the appropriate tool for this job. – eyelidlessness Feb 08 '10 at 07:00
  • its also easy to strip off punctuations if punctuations are not needed, without regex – ghostdog74 Feb 08 '10 at 07:36
0

This works for searching files that contain both String1 and String2

(((.|\n)*)String1((.|\n)*)String2)|(((.|\n)*)String2((.|\n)*)String1)

Match any number of characters or line fields followed by String1 followed by any number of characters or line fields followed by String2 OR Match any number of characters or line fields followed by String2 followed by any number of characters or line fields followed by String1

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Don
  • 124
  • 7
  • Hi, for the same very first example cat and mat. could you advise how to perform negative regex find. for instance. if I find cat and mat in a line. I should ignore that line. but if find cat with anything else I should capture that. So below send line cat with carpet should be considered. `The cat slept on the mat in front of the fire.` `At 5:00 pm, I found the cat scratching the wool off the carpet` – enthuguy May 12 '16 at 11:40