I have the following table:
CREATE TABLE test_regex (
drug TEXT
);
INSERT INTO test_regex
VALUES
('DRUGA 200 MG'),
('DRUGB 150 MCG'),
('DRUGC 1.5 GM BOX'),
('DRUGD 27.2 MG/5 ML VIAL')
;
I want to extract the following:
200 MG
150 MCG
1.5 GM
27.2 MG
What I have tried so far:
SELECT
substring(drug, '[0-9]*\.?[0-9]* MG|GM|MCG')
FROM
test_regex
;
Which results in:
200 MG
MCG
GM
27.2 MG
I've also tried:
SELECT
substring(drug, '[0-9]*\.?[0-9]* (MG|GM|MCG)')
FROM
test_regex
;
With this result:
MG
MCG
GM
MG
I think the problem lies in how the trailing (MG|GM|MCG)
group is handled but I couldn't find what I was looking for in the PostgreSQL Docs. I was expecting to get the first number section, then a space, then either MG, GM or MCG. I think its getting grouped as the number expression with MG, then either GM or MCG.