I am trying to build a 'simple' regular expression (in java) that matches sentences like these:
I want to cook something
I want to cook something with chicken and cheese
I want to cook something with chicken but without onions
I want to cook something without onions but with chicken and cheese
I want to cook something with candy but without nuts within 30 minutes
In best case it should also match:
I want to cook something with candy and without nuts within 30 minutes
In those examples I want to capture 'included' ingredients, 'excluded' ingredients and the max 'duration' for the cooking procedure. As you can see each of those 3 capturing groups is optional in the pattern, each is starting with a specific word (with, (but )?without, within) and the groups should match using wildcards UNTIL the next of those specifics keywords is found. Additionally those ingredients can contain several words, so in the second/third example "chicken and cheese" should be matched to the named capturing group 'included'.
In best case I would like to write a pattern similar to this one:
I want to cook something ((with (?<include>.+))|((but )?without (?<exclude>.+))|(within (?<duration>.+) minutes))*
Apparently this does not work because those wildcards can also match to the keywords so after the first keyword got matched everything else (including further keywords) will be matched by the greedy wildcard of the corresponding named capturing group.
I tried working with lookahead, for example something like this:
something ((with (?<IncludedIngredients>.*(?=but)))|(but )?without (?<ExcludedIngredients>.+))+
That regex recognizes something with chicken but without onions
but does not match to something with chicken
.
Is there a simple solution to do this in regular expressions?
P.S. 'Simple' solution means that I don't have to specify all possible combinations of those keywords in a sentence and order them by the amount of keywords being used in each combination.