I have this (simplified) regex:
((\s(python|java)\s)?((\S+\s+and\s))?(\S+\s+(love|hate)))
I created this in the regexr environment and tested this on this sentence:
python and java love python love python and java java
Which matches:
python and java love
python love
python and java java
This is exactly what I wanted. So I implemented this in python:
import re
regex = re.compile("((\s(python|java)\s)?((\S+\s+and\s))?(\S+\s+(love|hate)))")
string = "python and java love python love python and java java"
print(str(re.findall(regex,string)))
However this will give:
[('python and java love', '', '', 'python and ', 'python and ', 'java love', 'love'), ('python love', '', '', '', '', 'python love', 'love')]
What causes this difference and how can this be fixed?
Update 1
Using raw strings will not work either:
import re
regex = re.compile(r'((\s(python|java)\s)?((\S+\s+and\s))?(\S+\s+(love|hate)))')
string = "python and java love python love python and java java"
print(str(re.findall(regex,string)))
This will still give:
[('python and java love', '', '', 'python and ', 'python and ', 'java love', 'love'), ('python love', '', '', '', '', 'python love', 'love')]
Update 2
I will use my other regex (other terms) because I than can exactly say what I want to match and what not:
"(?:\s(?:low|high)\s)?(?:\S+\s+and\s)?(\S+\s+stress|deficiency|limiting)"
What is should match:
low|high ANY_WORD stress|deficiency|limiting
ANY_WORD stress|deficiency|limiting
ANY_WORD and ANY_WORD stress|deficiency|limiting
ANY_WORD and ANY_WORD ANY_WORD stress|deficiency|limiting
(for the last one only allow two words after and if stress,deficiency or limiting is behind it
What is shouldn't match:
stress|deficiency|limiting (so don't match these if nothing is in front of them)
low
high
ANY_WORD
ANY_WORD and ANY_WORD
Example lists:
match:
salt and water stress
photo-oxidative stress
salinity and high light stress
low-temperature stress
Cd stress
Cu deficiency
N deficiency
IMI stress
no match:
stress
deficiency
limiting
temperature and water
low
high
water and salt