1

Let me first mention that this a well discussed problem and I have gone through several thread including these two - which are closest match Regex to match a string not followed by some string and A regex to match a substring that isn't followed by a certain other substring but they did not solve my problem.

I have strings containing volume and quantity in several different formats -mentioned below, e.g. 6 X 200ml mean 6 packs of 200 milliliters each. I want to extract only the quantity like 6 in the this example

Examples

  1. blah 6 X 200ml -- 6
  2. blah 200 mlX 6 -- 6
  3. blah x 5000 ml -- 0 or better 1
  4. blah x 500000ml -- 0 or better 1
  5. blah 5mlX10 -- 10
  6. blah 500 mlX 10 -- 10

This is what I've tried so far without any success

(X\s*\d+|\d+\s*X)(?!\s*ml)

it matches case #3 and 4 as well which shouldn't be matched. I am also fine with extracting quantity like 6 with multiplication sign e.g 6 X instead of just 6. I can replace it.

Ali
  • 7,810
  • 12
  • 42
  • 65
  • 1
    won't [**`this`**](https://regex101.com/r/riq5Sw/1) work – rock321987 Jan 29 '18 at 03:28
  • Nope. I get the following error when I include a negative loo-ahead for example. Error: Invalid preceding regular expression prior to repetition operator. The error occured while parsing the regular expression fragment: '*(\d)\b(?>>>HERE>>>!\s+ml)) ... – Ali Jan 29 '18 at 07:01
  • What is your database (Oracle, MySql, PostgreSQL, MSSQL) ? REGEXP support is different in different databases, in many of them it is limited and does not support many features, for examle lookarounds. It is hard to help not knowing which database you are using. – krokodilko Jan 29 '18 at 07:07
  • Its Amazon Redshift and since redshift used PostgreSQL, it might be possible that if it works on Postgres, it will work on Redshift – Ali Jan 29 '18 at 07:19
  • @Gurman why have to removed your answer, it was a wonderful answer, just my mistake not mentioning its an SQL problem. Probably your answer will help someone else – Ali Jan 29 '18 at 07:20

1 Answers1

0

You didn't mention a database you are using in the question.
The SQL standard does not include regular expressions, so each database has its own regexp engine implementation, each of them is different and does not support many features of regular expressions, like lookarounds. It is hard to help you without knowing an exact database you are using.


The below are two simple examples how this problem can be solved in Oracle and PostgreSQL databases using
But this won't work on other databases than Oracle/PostgreSQL.



A query for Oracle:
Online demo: http://sqlfiddle.com/#!4/599c41/5

select t.*,
     regexp_substr( regexp_replace( "text", '\d+\s*ml', '///' ), '\d+' ) as x
from table1 t;

|              text |      X |
|-------------------|--------|
|  blah 6 X 200ml   |      6 |
|  blah 200 mlX 6   |      6 |
|  blah x 5000 ml   | (null) |
| blah x 500000ml   | (null) |
|     blah 5mlX10   |     10 |
| blah 500 mlX 10   |     10 |

If you want to replace NULLs by 0 or 1, you can use CASE EXPRESSIONs in this way:

select t.*,
     CASE WHEN regexp_substr( regexp_replace( "text", '\d+\s*ml', '///' ), '\d+' )
        IS NULL THEN '1' /* or 0 */
        ELSE regexp_substr( regexp_replace( "text", '\d+\s*ml', '///' ), '\d+' )
     END as x
from table1 t;

|              text |  X |
|-------------------|----|
|  blah 6 X 200ml   |  6 |
|  blah 200 mlX 6   |  6 |
|  blah x 5000 ml   |  1 |
| blah x 500000ml   |  1 |
|     blah 5mlX10   | 10 |
| blah 500 mlX 10   | 10 |

A query for PostgreSQL:

select t.*,
     substring( regexp_replace( "text", '\d+\s*ml', '///') from '\d+' ) as x
from table1 t;

|              text |      x |
|-------------------|--------|
|  blah 6 X 200ml   |      6 |
|  blah 200 mlX 6   |      6 |
|  blah x 5000 ml   | (null) |
| blah x 500000ml   | (null) |
|     blah 5mlX10   |     10 |
| blah 500 mlX 10   |     10 |

Online demo: http://sqlfiddle.com/#!17/b003b/1

krokodilko
  • 35,300
  • 7
  • 55
  • 79
  • Could you still help me find out the solution if I change the input? What if there is no no unit like 'ml' at the end; for example 'blah X 200' should return 0 or null as there is no quantity mentioned. Thanks – Ali Feb 17 '18 at 12:31