1

I'm trying to make a regex which can get everything until it finds some specific words, but if these words are not present in then text, the just grab everything, in this example let's consider our group of words: ['ABC', 'HIJ', 'TUV']

I have no ideia ABC about who i am
I have no ideia

I may have an idea about who you HIJ think you are
I may have an idea about who you

Sometimes i just wish you are not here
Sometimes i just wish you are not here

It finds everything until one of the words i defined, but if this word is not present like in the last string, then it gets everything.

My attempt:

(.*)(?:ABC|HIJ|TUV|$)

But it always get the entire string even when it has some of the words in the group.

P.S: I'm applying this in python

JohnyL
  • 6,894
  • 3
  • 22
  • 41
  • 2
    You're very close, just add `?` to make `*` non-greedy: [`^(.*?)(?:ABC|HIJ|TUV|$)`](https://regex101.com/r/H2t4XM/1/) – Hao Wu May 07 '21 at 05:36

1 Answers1

3

With your shown samples, could you please try following. Using findall function of Python.

import re
lst =  ['ABC', 'HIJ', 'TUV']
var=""" have no ideia ABC about who i am
I have no ideia

I may have an idea about who you HIJ think you are
I may have an idea about who you

Sometimes i just wish you are not here
Sometimes i just wish you are not here"""
regex = r'(.*?)(?:' + '|'.join(lst) + r'|$)'
re.findall(regex,var)
[' have no ideia ', 'I may have an idea about who you ', 'Sometimes i just wish you are not here', '']

Explanation: Simple explanation would be, using Python's re library. Then creating variable var which has value in it. Then Creating regex variable with join function of Python to create regex in it. Then applying findall function with ready regex on var to get all occurrences before words present in list.

Explanation of regex'(.*?)(?:ABC|HIJ|TUV|$)': Using non greedy capability to match till all elements present in list named lst in a non-capturing group.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93