I am trying to extract set of keywords such as ['lemon', 'apple', 'coconut'] etc. from the paths such as "\var\prj\lemon_123\xyz", "\var\prj\123_apple\coconut", "\var\prj\lemonade\coconutapple", "\var\prj\apple\lemon"
The expected output is little complex:
Paths | MatchedKeywords |
---|---|
"/var/prj/lemon_123/xyz" | lemon |
"/var/prj/123_apple/coconut" | apple, coconut |
"/var/prj/lemonade/coconutapple" | |
"/var/prj/apple/lemon" | apple, lemon |
keep in mind that the third row does not have the exact word which start with /, \s, \d or _ thats why there is no match. The regular expression is kind of like this: \s\d_/[\s\d_/]. I tried using:
df['Paths'].str.findall(r'[^\s\d_/]lemon|apple|coconut[\s\d_/$]', flags=re.IGNORECASE)
But it is still showing 'lemon' and 'coconut' in the third row.
Thank you in advance.