I want to find a better way to get my result. I use a regex pattern
to match all text of the form (DD+ some text DDDD some other text)
if and only if it is not preceded of non-fixed width lookbehind terms. How can I include these terms inside of my REGEX pattern
?
aa = pd.DataFrame({"test": ["45 python 00222 sometext",
"python white 45 regex 00 222 somewhere",
"php noise 45 python 65000 sm",
"otherword 45 python 50000 sm"]})
pattern = re.compile("(((\d+)\s?([^\W\d_]+)\s?)?(\d{2}\s?\d{3})\s?(\w.+))")
aa["result"] = aa["test"].apply(lambda x: pattern.search(x)[0] if pattern.search(x) else None)
lookbehind = ['python', 'php']
aa.apply(lambda x: "" if any(look in x["test"].replace(x["result"], "") for look in lookbehind) else x["result"], axis=1)
The output is what I expected
0 45 python 00222 sometext
1
2
3 45 python 50000 sm