I'm unclear how negative regular expressions work. I followed several posts (post 1, post 2, ) and I used their patterns and they work, but their explanations do not make sense to me. I tried several regex tester sites like regex101 etc, but they fail to process patterns that appear to work in Python as per posts 1 & 2.
My preferred approach is that regex would behave for negative logic the same way it behaves for positive logic. However, it appears to me that as soon as negative logic is used, it starts a whole new way of processing that is hard to follow. I know there are workarounds for this, but I'm interested in understanding it via regex.
Objective for the below examples: Say I have a list of commodities, from which I would like to get a list that are not "gas" as defined by variable except
. In other words, I need a list of products that do not "contain" word "gas" in their name.
Here is a helper code to try different ideas:
import re
cmdty = ['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']
expect= cmdty[-3:] # i.e. ['Crude Oil', 'Brent', 'WTI']
print(f'Starting list: {cmdty}. Would like to get: {expect}')
def check (pattern,cmdty=cmdty, expect=expect, comment=""):
out = [c for c in cmdty if re.search(pattern,c)]
good = "yes" if set(out) == set(expect) else "no"
print(f'pattern={pattern:20}: worked: {good:>3}. output={out}. comment: {comment}')
Various attempts at regular expressions to make it work:
check(pattern='(?i)(?=gas)',comment="This one works, but requires negating the results")
check(pattern='(?i)(?!gas)',comment="My hope was that this would work")
check('(?i)(?:!gas)',comment="")
check('(?i)\s(?!gas)',comment="strange outcome")
check('(?i).*(?!gas).*')
check('^(?i)(?!.*gas).*$', comment='works')
check('^(?i)((?!gas).)*$', comment='not sure this one works')
check('(?i)^.*(?!gas).*$',comment="I'd expect this one to work, but does not")
check('(?i)^(?!.*gas).*$', comment='works')
check('(?i)nat(?!gas)', comment='makes sense, but super odd')
Initial list and the objective:
Starting list: ['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI'].
Would like to get: ['Crude Oil', 'Brent', 'WTI']
Here are output results using various attempts to get it to work. What is the way to think about this, so it makes sense.
pattern=(?i)(?=gas) : worked: no. output=['natural gas', 'Henry Hub Natural Gas Contract']. comment: This one works, but requires negating the results
pattern=(?i)(?!gas) : worked: no. output=['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']. comment: My hope was that this would work
pattern=(?i)(?:!gas) : worked: no. output=[]. comment:
pattern=(?i)\s(?!gas) : worked: no. output=['Henry Hub Natural Gas Contract', 'Crude Oil']. comment: strange outcome
pattern=(?i).*(?!gas).* : worked: no. output=['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']. comment:
pattern=^(?i)(?!.*gas).*$ : worked: yes. output=['Crude Oil', 'Brent', 'WTI']. comment: works
pattern=^(?i)((?!gas).)*$ : worked: yes. output=['Crude Oil', 'Brent', 'WTI']. comment: not sure this one works
pattern=(?i)^.*(?!gas).*$ : worked: no. output=['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']. comment: I'd expect this one to work, but does not
pattern=(?i)^(?!.*gas).*$ : worked: yes. output=['Crude Oil', 'Brent', 'WTI']. comment: works
pattern=(?i)nat(?!gas) : worked: no. output=['natural gas', 'Henry Hub Natural Gas Contract']. comment: makes sense, but super odd`