I have this regex :
cont_we_re = r"((?!\S+\s?(?:(cbf|cd3|cbm|m3|m[\\\>\?et]?|f3|ft3)))(?:([0-9,\.]+){2,})(?:\s*(?:(lb|kg)\.?s?))?)"
Right now, the logic followed is match with any numeric chunk optionally if followed by only kgs
or lbs
but don't match if cbf
, cd3
, cbm
, m3
etc. are found after the numeric chunk. It works perfectly for these sample cases :
s1 = "18300 kg 40344.6 lbs 25000 m3"
s2 = "18300kg 40344.6lbs 25000m3"
s3 = "18300 kg KO"
s4 = "40344.6 lb5 "
s5 = "40344.6 "
I'm using re.finditer()
with re.IGNORECASE
flag, like this :
for s in [s1, s2, s3, s4, s5]:
all_val = [i.group().strip() for i in re.finditer(cont_we_re, s, re.IGNORECASE)]
Gives me this output :
['18300 kg', '40344.6 lbs']
['18300kg', '40344.6lbs']
['18300 kg']
['40344.6 lb']
['40344.6']
Now I'm trying to implement another logic : if we find numeric chunk followed by lbs
then match it with first priority and return only that match, but if not found lbs
and found only numeric chunk or numeric chunk followed by kgs
then take those.
I've done this without changing the regex, like this :
for s in [s1, s2, s3, s4, s5]:
all_val = [i.group().strip() for i in re.finditer(cont_we_re, s, re.IGNORECASE)]
kg_val = [i for i in all_val if re.findall(r"kg\.?s?", i)]
lb_val = [i for i in all_val if re.findall(r"lb\.?s?", i)]
final_val = lb_val if lb_val else (kg_val if kg_val else list(set(all_val) - (set(kg_val+lb_val))))
This gives me the desired output (which is perfect) :
['40344.6 lbs']
['40344.6lbs']
['18300 kg']
['40344.6 lb']
['40344.6']
Question is how can I apply this same logic in the regex, without finding for kgs
and lbs
separately on each matched group by cont_we_re
for each string. I tried "IF-THEN-ELSE" type regex as portrayed in this question but it doesn't work as the first part of the regex (?
supposedly yields pattern error in python. Is there any way to do this with only cont_we_re
regex?