0

I'm looking for a Regex to match the whole text of every sentence between the OR operators that contains one or more ANDs, if and only if, one or more ANDs is in the sentence between two ORs. For instance:

this should match

OR "Message:\"An Arm and a Leg \<Meaning\>: Something that is extremely expensive.\"" AND "Message:\"Jaws of Death \<Meaning\>: Being in a dangerous or very deadly situation.\"" OR

OR "Message:\"Know the Ropes \<Meaning\>: Having a familiarity or understanding of how something works.\"" AND "Message:\"Poke Fun At \<Meaning\>: Making fun of something or someone; ridicule.\"" AND "Message:\"Give a Man a Fish \<Meaning\>: It's better to teach a person how to do something than to do that something for them.\"" AND "Message:\"Money Doesn't Grow On Trees \<Meaning\>: Suggests that money is a resource that must be earned and is not one that's easily acquired.\"" AND "Message:\"There's No I in Team \<Meaning\>: To not work alone, but rather, together with others in order to achieve a certain goal.\"" AND "Message:\"A Busy Bee \<Meaning\>: An industrious person.\"" AND "Message:\"Wake Up Call \<Meaning\>: An occurance of sorts that brings a problem to somebody's attention and they realize it needs fixing.\"" AND "Message:\"A Lot on One\'s Plate \<Meaning\>: A lot \(or too much\) to do or cope with.\"" AND "Message:\"Under the Weather \<Meaning\>: Not feeling well, in health or mood.\"" OR

This shouldn't match:

OR "Message:\"Break The Ice \<Meaning\>: Breaking down a social stiffness.\"" OR

this is a placeholder/random text to use as example:

"Message:\"Knock Your Socks Off \<Meaning\>: To be taken by surprise.\"" AND "Message:\"Playing For Keeps \<Meaning\>: Said when things are about to get serious.\"" OR "Message:\"Break The Ice \<Meaning\>: Breaking down a social stiffness.\"" OR "Message:\"Right Out of the Gate \<Meaning\>: Right from the beginning; to do something from the start.\"" OR "Message:\"Birds of a Feather Flock Together \<Meaning\>: People tend to associate with others who share similar interests or values.\"" AND "Message:\"Up In Arms \<Meaning\>: Angry; being roused to the point that you are ready to fight.\"" OR "Message:\"Know the Ropes \<Meaning\>: Having a familiarity or understanding of how something works.\"" AND "Message:\"Poke Fun At \<Meaning\>: Making fun of something or someone; ridicule.\"" AND "Message:\"Give a Man a Fish \<Meaning\>: It's better to teach a person how to do something than to do that something for them.\"" AND "Message:\"Money Doesn't Grow On Trees \<Meaning\>: Suggests that money is a resource that must be earned and is not one that's easily acquired.\"" AND "Message:\"There's No I in Team \<Meaning\>: To not work alone, but rather, together with others in order to achieve a certain goal.\"" AND "Message:\"A Busy Bee \<Meaning\>: An industrious person.\"" AND "Message:\"Wake Up Call \<Meaning\>: An occurance of sorts that brings a problem to somebody's attention and they realize it needs fixing.\"" AND "Message:\"A Lot on One\'s Plate \<Meaning\>: A lot \(or too much\) to do or cope with.\"" AND "Message:\"Under the Weather \<Meaning\>: Not feeling well, in health or mood.\"" OR "Message:\"A Day Late and a Dollar Short \<Meaning\>: Too late. A missed opportunity.\"" OR "Message:\"Back to Square One \<Meaning\>: To go back to the beginning; back to the drawing board.\"" OR "Message:\"An Arm and a Leg \<Meaning\>: Something that is extremely expensive.\"" AND "Message:\"Jaws of Death \<Meaning\>: Being in a dangerous or very deadly situation.\"" OR "Message:\"Barking Up The Wrong Tree \<Meaning\>: To make a wrong assumption about something.\"" OR "Message:\"Swinging For the Fences \<Meaning\>: Giving something your all.\"" OR "Message:\"Talk the Talk \<Meaning\>: Supporting what you say, not just with words, but also through action or evidence.\"" OR "Message:\"Back To the Drawing Board \<Meaning\>: Starting over again on a new design from a previously failed attempt.\"" OR "Message:\"On the Ropes \<Meaning\>: Being in a situation that looks to be hopeless!\"" OR "Message:\"Tug of War \<Meaning\>: It can refer to the popular rope pulling game or it can mean a struggle for authority.\"" AND "Message:\"A Dime a Dozen \<Meaning\>: Something that is extremely common.\"" AND "Message:\"In a Pickle \<Meaning\>: Being in a difficult predicament; a mess; an undesirable situation.\"" AND "Message:\"Ring Any Bells? \<Meaning\>: Recalling a memory; causing a person to remember something or someone.\"" AND "Message:\"When the Rubber Hits the Road \<Meaning\>: When something is about to begin, get serious, or put to the test.\"" AND "Message:\"Burst Your Bubble \<Meaning\>: To ruin someone's happy moment.\"" AND "Message:\"No Ifs, Ands, or Buts \<Meaning\>: Finishing a task without making any excuses.\"" AND "Message:\"Tough It Out \<Meaning\>: To remain resillient even in hard times; enduring.\"" OR "Message:\"Curiosity Killed The Cat \<Meaning\>: Typically said to indicate that any further investigation into a situation may lead to harm.\"" OR "Message:\"A Chip on Your Shoulder \<Meaning\>: Being angry about something that happened in the past.\"" OR "Message:\"A Cold Day in July \<Meaning\>: Something that is highly unlikely to happen.\"" OR "Message:\"Cry Over Spilt Milk \<Meaning\>: It's useless to worry about things that already happened and cannot be changed.\"" OR "Message:\"A Leg Up \<Meaning\>: Someone who's given an advantage over others.\"" OR "Message:\"It's Not Brain Surgery \<Meaning\>: A task that's easy to accomplish, a thing lacking complexity.\"" OR "Message:\"You Can't Judge a Book By Its Cover \<Meaning\>: Don't judge someone or something only by the outward appearance.\"" AND "Message:\"Down For The Count \<Meaning\>: Someone or something that looks to be defeated, or nearly so.\"" OR "Message:\"Yada Yada \<Meaning\>: A way to notify a person that what they're saying is predictable or boring.\"" AND "Message:\"Let Her Rip \<Meaning\>: Permission to start, or it could mean 'go faster!'\"" OR "Message:\"Wouldn't Harm a Fly \<Meaning\>: Nonviolent; someone who is mild or gentle.\"" OR "Message:\"Off One's Base \<Meaning\>: A person that is crazy or behaving in idiotic ways\"" AND "Message:\"Close But No Cigar \<Meaning\>: Coming close to a successful outcome only to fall short at the end.\"" AND "Message:\"It's Not All It's Cracked Up To Be \<Meaning\>: Failing to meet expectations; not being as good as people say.\"" AND "Message:\"What Am I, Chopped Liver? \<Meaning\>: A rhetorical question used by a person who feels they are being given less consideration than someone else.\"" AND "Message:\"A Dog in the Manger \<Meaning\>: Someone who prevents others from using valuable items even though they have no need for them.\"" AND "Message:\"A Bite at the Cherry \<Meaning\>: An opportunity that's not available to most people.\"" OR "Message:\"Don't Count Your Chickens Before They Hatch \<Meaning\>: Do not rely on something you are not sure of.\"

I'm using Positive lookbehind at the beginning and Positive lookahead at the end to set boundaries, i tried with (.?AND.?) to match any character between zero and unlimited times and as few times as possible. I tried with:

(?<=OR)(.*?AND.*?)(?=OR)

(?<=OR) (?:[\s\S])*? AND (?:[\s\S\w]+?)(?=OR)

They stop matching at the OR (after the AND), but the do not start matching at the first OR before the AND.

vJos
  • 131
  • 1
  • 2
  • 12

2 Answers2

3

If I understand you correctly, you want to search for one/or more AND between the OR:

(?<=OR)((?:(?!OR).)+AND(?:(?!OR).)+)(?=OR)

Regex demo.

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 1
    That's a good solution, provided "AND" or "OR" are not inside any quoted string. – craigb Nov 06 '22 at 22:09
  • I edited the answer to add more details. It should match one or more ANDs between two ORs – vJos Nov 06 '22 at 22:33
  • 1
    @vJos I've updated the regex + demo. – Andrej Kesely Nov 06 '22 at 22:37
  • 1
    I suggest you add some word boundaries to avoid matching `'OR'` as part of `'FOR'` and `'AND'` as part of `'HAND'`. – Cary Swoveland Nov 07 '22 at 00:41
  • @CarySwoveland see the regex in the timing comparison of my answer - I have used spaces around `' OR '` and `' AND '` as this seems to be the used syntax. – Claudio Nov 07 '22 at 08:00
  • Would you be so kind to explain how `(?:(?!OR).)` works, especially why switching `.` and `(?!OR)` giving `(?:.(?!OR))` works OK in the first used term giving same result, but not if the switch is done in the second same term after `AND`? – Claudio Nov 07 '22 at 10:42
  • @Claudio It's informally called *tempered greedy token* - the in-depth explanation can be found here: https://stackoverflow.com/questions/30900794/tempered-greedy-token-what-is-different-about-placing-the-dot-before-the-negat – Andrej Kesely Nov 07 '22 at 10:53
  • @AndrejKesely I have checked out the in-depth explanation still without being able to resolve my confusion why switching the order works ok for the first term and provides same result and doing the same in the second term results in no match at all. Most probably the combination of the switched *tempered greedy token* with the lookahead compared to the combination of it with a literally match makes the difference. To understand that it is probably necessary to know how the regex engine works in detail and construct some special test cases demonstrating the behavior. – Claudio Nov 07 '22 at 12:14
  • In case of " ORAND AND OR OR AND OR text0 OR text1 AND text2 OR " using splitting results in: `['text1 AND text2']` and using searching has side-effects giving:`['AND AND ', ' AND ', ' text1 AND text2 ']` what in case of a very huge text makes it hard to see how it comes. In practice not always the input is well formatted without any unexpected sequences. – Claudio Nov 07 '22 at 12:39
  • @Claudio, regarding `(?:(?!OR).)+`, suppose the string were `'ab OR c'` and one wanted to simply match the characters that preceded the substring `'OR'` (i.e., `'ab '`). The string pointer is initially immediately before the first character of the string, `'a'`. `(?!OR)` is a negative lookahead that asserts that the next two characters (`'ab'`) do no equal `'OR'`. That is satisfied, so `.` matches `'a'` and the string pointer advances by one character to the position between the `'a'` and the `'b'`. `(?!OR)` is again satisfied because the next two characters (`'b '`) do not match `'OR'`... – Cary Swoveland Nov 07 '22 at 18:40
  • ...so `'b'` is matched. The match so far is `'ab'` and the string pointer is now between the `'b'` and `' '`. `(?!OR)` is again satisfied so `' '` is matched and the string pointer is now between the `' '` and `'O'`. At this point `(?!OR)` fails as the next two characters are `'OR'`. The match returned is therefore `'ab '`. If you repeat this logic for `(?:.(?!OR))+` you will find that only `'ab'` is matched. – Cary Swoveland Nov 07 '22 at 18:53
  • @CarySwoveland : thanks for the very simple explanation which resolved my confusion. There is no effect of switching the position of the `.` in the first term in the answer above on the result because no OR is allowed on the left side of AND in the result, but OR is required to exist in order to make the difference. Switching the positions in the second term has an effect because there is an OR the term right of AND runs into and if switched consumes the O what makes the positive lookahead for OR not to be satisfied (the reason why there is then no result returned). Have I got it right? – Claudio Nov 07 '22 at 20:06
1

I suggest to use a simple regex for splitting by ' OR ' and adjusting the result in a list comprehension instead of a complex regex for searching (in order to speed things up and to improve the readability of the code):

message = '''"Message:\"Knock Your Socks Off \<Meaning\>: To be taken by surprise.\"" AND "Message:\"Playing For Keeps \<Meaning\>: Said when things are about to get serious.\"" OR  "Message:\"Break The Ice \<Meaning\>: Breaking down a social stiffness.\"" OR "Message:\"Right Out of the Gate \<Meaning\>: Right from the beginning; to do something from the start.\"" OR "Message:\"Birds of a Feather Flock Together \<Meaning\>: People tend to associate with others who share similar interests or values.\"" AND "Message:\"Up In Arms \<Meaning\>: Angry; being roused to the point that you are ready to fight.\"" OR "Message:\"Know the Ropes \<Meaning\>: Having a familiarity or understanding of how something works.\"" AND "Message:\"Poke Fun At \<Meaning\>: Making fun of something or someone; ridicule.\"" AND "Message:\"Give a Man a Fish \<Meaning\>: It's better to teach a person how to do something than to do that something for them.\"" AND "Message:\"Money Doesn't Grow On Trees \<Meaning\>: Suggests that money is a resource that must be earned and is not one that's easily acquired.\"" AND "Message:\"There's No I in Team \<Meaning\>: To not work alone, but rather, together with others in order to achieve a certain goal.\"" AND "Message:\"A Busy Bee \<Meaning\>: An industrious person.\"" AND "Message:\"Wake Up Call \<Meaning\>: An occurance of sorts that brings a problem to somebody's attention and they realize it needs fixing.\"" AND "Message:\"A Lot on One\'s Plate \<Meaning\>: A lot \(or too much\) to do or cope with.\"" AND "Message:\"Under the Weather \<Meaning\>: Not feeling well, in health or mood.\"" OR "Message:\"A Day Late and a Dollar Short \<Meaning\>: Too late. A missed opportunity.\"" OR "Message:\"Back to Square One \<Meaning\>: To go back to the beginning; back to the drawing board.\"" OR "Message:\"An Arm and a Leg \<Meaning\>: Something that is extremely expensive.\"" AND "Message:\"Jaws of Death \<Meaning\>: Being in a dangerous or very deadly situation.\"" OR "Message:\"Barking Up The Wrong Tree \<Meaning\>: To make a wrong assumption about something.\"" OR "Message:\"Swinging For the Fences \<Meaning\>: Giving something your all.\"" OR "Message:\"Talk the Talk \<Meaning\>: Supporting what you say, not just with words, but also through action or evidence.\"" OR "Message:\"Back To the Drawing Board \<Meaning\>: Starting over again on a new design from a previously failed attempt.\"" OR "Message:\"On the Ropes \<Meaning\>: Being in a situation that looks to be hopeless!\"" OR "Message:\"Tug of War \<Meaning\>: It can refer to the popular rope pulling game or it can mean a struggle for authority.\"" AND "Message:\"A Dime a Dozen \<Meaning\>: Something that is extremely common.\"" AND "Message:\"In a Pickle \<Meaning\>: Being in a difficult predicament; a mess; an undesirable situation.\"" AND "Message:\"Ring Any Bells? \<Meaning\>: Recalling a memory; causing a person to remember something or someone.\"" AND "Message:\"When the Rubber Hits the Road \<Meaning\>: When something is about to begin, get serious, or put to the test.\"" AND "Message:\"Burst Your Bubble \<Meaning\>: To ruin someone's happy moment.\"" AND "Message:\"No Ifs, Ands, or Buts \<Meaning\>: Finishing a task without making any excuses.\"" AND "Message:\"Tough It Out \<Meaning\>: To remain resillient even in hard times; enduring.\"" OR "Message:\"Curiosity Killed The Cat \<Meaning\>: Typically said to indicate that any further investigation into a situation may lead to harm.\"" OR "Message:\"A Chip on Your Shoulder \<Meaning\>: Being angry about something that happened in the past.\"" OR "Message:\"A Cold Day in July \<Meaning\>: Something that is highly unlikely to happen.\"" OR "Message:\"Cry Over Spilt Milk \<Meaning\>: It's useless to worry about things that  already happened and cannot be changed.\"" OR "Message:\"A Leg Up \<Meaning\>: Someone who's given an advantage over others.\"" OR "Message:\"It's Not Brain Surgery \<Meaning\>: A task that's easy to accomplish, a thing lacking complexity.\"" OR "Message:\"You Can't Judge a Book By Its Cover \<Meaning\>: Don't judge someone or something only by the outward appearance.\"" AND "Message:\"Down For The Count \<Meaning\>: Someone or something that looks to be defeated, or nearly so.\"" OR "Message:\"Yada Yada \<Meaning\>: A way to notify a person that what they're saying is predictable or boring.\"" AND "Message:\"Let Her Rip \<Meaning\>: Permission to start, or it could mean 'go faster!'\"" OR "Message:\"Wouldn't Harm a Fly \<Meaning\>: Nonviolent; someone who is mild or gentle.\"" OR "Message:\"Off One's Base \<Meaning\>: A person that is crazy or behaving in idiotic ways\"" AND "Message:\"Close But No Cigar \<Meaning\>: Coming close to a successful outcome only to fall short at the end.\"" AND "Message:\"It's Not All It's Cracked Up To Be \<Meaning\>: Failing to meet expectations; not being as good as people say.\"" AND "Message:\"What Am I, Chopped Liver? \<Meaning\>: A rhetorical question used by a person who feels they are being given less consideration than someone else.\"" AND "Message:\"A Dog in the Manger \<Meaning\>: Someone who prevents others from using valuable items even though they have no need for them.\"" AND "Message:\"A Bite at the Cherry \<Meaning\>: An opportunity that's not available to most people.\"" OR "Message:\"Don't Count Your Chickens Before They Hatch \<Meaning\>: Do not rely on something you are not sure of.\"'''
import re
selection = [ item for item in re.split(' OR ', message)[1:-1] 
                  if ' AND ' in item ]
print(*selection, sep='\n')

giving

"Message:"Birds of a Feather Flock Together \<Meaning\>: People tend to associate with others who share similar interests or values."" AND "Message:"Up In Arms \<Meaning\>: Angry; being roused to the point that you are ready to fight.""
"Message:"Know the Ropes \<Meaning\>: Having a familiarity or understanding of how something works."" AND "Message:"Poke Fun At \<Meaning\>: Making fun of something or someone; ridicule."" AND "Message:"Give a Man a Fish \<Meaning\>: It's better to teach a person how to do something than to do that something for them."" AND "Message:"Money Doesn't Grow On Trees \<Meaning\>: Suggests that money is a resource that must be earned and is not one that's easily acquired."" AND "Message:"There's No I in Team \<Meaning\>: To not work alone, but rather, together with others in order to achieve a certain goal."" AND "Message:"A Busy Bee \<Meaning\>: An industrious person."" AND "Message:"Wake Up Call \<Meaning\>: An occurance of sorts that brings a problem to somebody's attention and they realize it needs fixing."" AND "Message:"A Lot on One's Plate \<Meaning\>: A lot \(or too much\) to do or cope with."" AND "Message:"Under the Weather \<Meaning\>: Not feeling well, in health or mood.""
"Message:"An Arm and a Leg \<Meaning\>: Something that is extremely expensive."" AND "Message:"Jaws of Death \<Meaning\>: Being in a dangerous or very deadly situation.""
"Message:"Tug of War \<Meaning\>: It can refer to the popular rope pulling game or it can mean a struggle for authority."" AND "Message:"A Dime a Dozen \<Meaning\>: Something that is extremely common."" AND "Message:"In a Pickle \<Meaning\>: Being in a difficult predicament; a mess; an undesirable situation."" AND "Message:"Ring Any Bells? \<Meaning\>: Recalling a memory; causing a person to remember something or someone."" AND "Message:"When the Rubber Hits the Road \<Meaning\>: When something is about to begin, get serious, or put to the test."" AND "Message:"Burst Your Bubble \<Meaning\>: To ruin someone's happy moment."" AND "Message:"No Ifs, Ands, or Buts \<Meaning\>: Finishing a task without making any excuses."" AND "Message:"Tough It Out \<Meaning\>: To remain resillient even in hard times; enduring.""
"Message:"You Can't Judge a Book By Its Cover \<Meaning\>: Don't judge someone or something only by the outward appearance."" AND "Message:"Down For The Count \<Meaning\>: Someone or something that looks to be defeated, or nearly so.""
"Message:"Yada Yada \<Meaning\>: A way to notify a person that what they're saying is predictable or boring."" AND "Message:"Let Her Rip \<Meaning\>: Permission to start, or it could mean 'go faster!'""
"Message:"Off One's Base \<Meaning\>: A person that is crazy or behaving in idiotic ways"" AND "Message:"Close But No Cigar \<Meaning\>: Coming close to a successful outcome only to fall short at the end."" AND "Message:"It's Not All It's Cracked Up To Be \<Meaning\>: Failing to meet expectations; not being as good as people say."" AND "Message:"What Am I, Chopped Liver? \<Meaning\>: A rhetorical question used by a person who feels they are being given less consideration than someone else."" AND "Message:"A Dog in the Manger \<Meaning\>: Someone who prevents others from using valuable items even though they have no need for them."" AND "Message:"A Bite at the Cherry \<Meaning\>: An opportunity that's not available to most people.""

Please be aware that this approach (and also the approach with searching in the other answer) doesn't cover the case in which OR and AND occur within the message text as there are no checks making sure that OR and AND are found outside the quotation marks.

Now let's compare the approach using splitting with the approach using searching:

import re
from time import perf_counter as T
sT_1=T()
selection_1 = [item for item in re.split(' OR ', message)[1:-1] if ' AND ' in item]
eT_1=T()
sT_2=T()
selection_2 = re.findall('(?<= OR )((?:(?! OR ).)+ AND (?:(?! OR ).)+)(?= OR )',message)
eT_2=T()
assert selection_1 == selection_2
print(f'{(eT_2-sT_2) - (eT_1-sT_1):8.6f}, {(eT_1-sT_1):8.6f}, {(eT_2-sT_2):8.6f}')  

which prints:

0.000300, 0.000150, 0.000450

showing that the approach using splitting runs 3 times faster than the approach using searching.

Claudio
  • 7,474
  • 3
  • 18
  • 48
  • That's fine, but the OP specifically asked for a regular expression, not for a solution that by some measure is the most efficient. Perhaps a regular expression is required as input to a library method, or maybe the OP just wants to better understand how regular expressions can be used. Sometimes questions with a regex tag ask for a regex that turns out to be so complex and intricate that no one in their right mind would use it in practise, but it still may produce answers that have educational value. – Cary Swoveland Nov 07 '22 at 09:51
  • @CarySwoveland you sure know yourself that giving an answer only to the exact OPs question isn't always appropriate. By the way: another guess along with these you listed is that the OP is just not aware of the possibility of using another (better) approach and ... to achieve an educational effect the explanations in the regex demo are usually not enough to grasp the general idea behind the solution. I can't yet clearly see how the complex regex in the other answer works and why it works. Maybe you can reply in the comments to the other answer to my request for explanation? – Claudio Nov 07 '22 at 10:49
  • @CarySwoveland: by the way: using a complex regex without understanding why and how it does what it does can lead to problems with adjusting it by oneself if there are special cases when it doesn't deliver the expected result. And the severity of the potential problems and the probability of having side-effects rises with the complexity of the regex, right? – Claudio Nov 07 '22 at 10:58
  • I agree, but does not your first sentence apply to coding in general? Complex regex's can also be difficult and time-consuming to test. On the other hand, a complex regex can sometimes replace many, many lines of code. Another benefit is that some coders achieve personal satisfaction by deriving a mind-bending regex, which is not necessarily a bad thing, in part because that sort of thing may help forestall coding burnout. – Cary Swoveland Nov 07 '22 at 18:58
  • @CarySwoveland : after your explanations in the comments to the other answer (especially considering a pointer position before the start and *between* the characters of the string) I can also better understand where the slow speed comes from. The conclusion from what I have learned from this is: if a search procedure can be splittet into subsequent independent parts reducing the number of characters and cases to check the regular expression running character by character for complex conditions with large lengths of string to check before progressing will slow the search down, right? – Claudio Nov 07 '22 at 20:34