How to extract text between 2 words in PostgreSQL?

Question

I have a column criteria which contains below similar text in each of its row:-

inclusion : ajjsdijd
sdsjdjs
ieroeito trorg inclusion
sdkjwedk

exclusion :
sdkjwdowek
 ksdldk exclusion
skdkefk
kfkwkfwe

I want to extract the text between first inclusion and exclusion occurance. So, here i want result as

ajjsdijd
sdsjdjs
ieroeito trorg inclusion
sdkjwedk

Also, I want to extract the text after first exclusion keyword:

sdkjwdowek
 ksdldk exclusion
skdkefk
kfkwkfwe

I am currently using below PostgreSQL but this creates issue and picks text between first inclusion and last exclusion..

substring(lower(criteria) from 'inclusion(.+)exclusion')
substring(lower(criteria) from 'exclusion(.+)')

Try using a lazy quantifier instead: `(.+?)` – Mateus Jul 29 '17 at 18:39 — Mateus, Jul 29 '17 at 18:39

Jonathan Willcock · Accepted Answer · 2017-07-29T18:56:22.067

2

You could try something like this:

DO $$
DECLARE input1 TEXT;
DECLARE output1 TEXT;
DECLARE output2 TEXT;
declare posincl integer;
declare posexcl integer;
BEGIN
  input1 := 'inclusion : ajjsdijd
  sdsjdjs
  ieroeito trorg inclusion
  sdkjwedk

  exclusion :
sdkjwdowek
 ksdldk exclusion
skdkefk
kfkwkfwe' ;
  posincl := position('inclusion :' in input1);
  posexcl := position('exclusion :' in input1);
  output1 := substring(input1 from (posincl + 11) for (posexcl - posincl - 11));  
  output2 := substring(input1 from (posexcl + 11)); 
  RAISE NOTICE 'Value of output1: %', output1;
  RAISE NOTICE 'Value of output2: %', output2;
END $$;

edited Jul 29 '17 at 18:56

answered Jul 29 '17 at 18:52

Jonathan Willcock

5,012
3
20
31

Got me by 10s... – Mateus Jul 29 '17 at 18:53
@MateusA. And just as I was about to go to bed too ... – Jonathan Willcock Jul 29 '17 at 18:53
+1 for the whole code and the time. By the way, the text at the top shouldn't be a formatted as a code (`You could try something like this:`) – Mateus Jul 29 '17 at 18:55
1

@MateusA. Thanks corrected. Definitely time to switch off!!!! – Jonathan Willcock Jul 29 '17 at 18:57

score 2 · Answer 2 · edited Sep 23 '17 at 18:16

The reason why it happens is that you're using a Greedy quantifier.

Repetition in regex by default is greedy: they try to match as many reps as possible, and when this doesn't work and they have to backtrack, they try to match one fewer rep at a time, until a match of the whole pattern is found. As a result, when a match finally happens, a greedy repetition would match as many reps as possible. -polygenelubricants

What you have to do is change it to a Lazy quantifier, by adding the ? operator:

/inclusion(.+?)exclusion/

Try looking at this demo: https://regex101.com/r/TYGBrA/1 (Note the colon in your input with your given regex which could also be ignored with the sequence \s*:\s*).

How to extract text between 2 words in PostgreSQL?

2 Answers2