-1

I have a column criteria which contains below similar text in each of its row:-

inclusion : ajjsdijd
sdsjdjs
ieroeito trorg inclusion
sdkjwedk

exclusion :
sdkjwdowek
 ksdldk exclusion
skdkefk
kfkwkfwe 

I want to extract the text between first inclusion and exclusion occurance. So, here i want result as

ajjsdijd
sdsjdjs
ieroeito trorg inclusion
sdkjwedk

Also, I want to extract the text after first exclusion keyword:

sdkjwdowek
 ksdldk exclusion
skdkefk
kfkwkfwe 

I am currently using below PostgreSQL but this creates issue and picks text between first inclusion and last exclusion..

substring(lower(criteria) from 'inclusion(.+)exclusion')
substring(lower(criteria) from 'exclusion(.+)')
halfer
  • 19,824
  • 17
  • 99
  • 186
user3222101
  • 1,270
  • 2
  • 24
  • 43

2 Answers2

2

You could try something like this:

DO $$
DECLARE input1 TEXT;
DECLARE output1 TEXT;
DECLARE output2 TEXT;
declare posincl integer;
declare posexcl integer;
BEGIN
  input1 := 'inclusion : ajjsdijd
  sdsjdjs
  ieroeito trorg inclusion
  sdkjwedk

  exclusion :
sdkjwdowek
 ksdldk exclusion
skdkefk
kfkwkfwe' ;
  posincl := position('inclusion :' in input1);
  posexcl := position('exclusion :' in input1);
  output1 := substring(input1 from (posincl + 11) for (posexcl - posincl - 11));  
  output2 := substring(input1 from (posexcl + 11)); 
  RAISE NOTICE 'Value of output1: %', output1;
  RAISE NOTICE 'Value of output2: %', output2;
END $$;
Jonathan Willcock
  • 5,012
  • 3
  • 20
  • 31
2

The reason why it happens is that you're using a Greedy quantifier.

Repetition in regex by default is greedy: they try to match as many reps as possible, and when this doesn't work and they have to backtrack, they try to match one fewer rep at a time, until a match of the whole pattern is found. As a result, when a match finally happens, a greedy repetition would match as many reps as possible. -polygenelubricants

What you have to do is change it to a Lazy quantifier, by adding the ? operator:

/inclusion(.+?)exclusion/

Try looking at this demo: https://regex101.com/r/TYGBrA/1 (Note the colon in your input with your given regex which could also be ignored with the sequence \s*:\s*).

Graham
  • 7,431
  • 18
  • 59
  • 84
Mateus
  • 4,863
  • 4
  • 24
  • 32