I'm trying to extract the volume of a string field to sort by it numerically.
Given the following data:
- Something at 300 ml
- 300Ml somthing
- Something special (with 300 Ml)
- 8-v something that should not match
First attempt:
Just cast the string to unsigned
(found in this thread: reference)
Problem: It'll obviously also take the 4th data example into account.
Second attempt: Use REGEXP_REPLACE to extract the value (found in this thread: reference)
CAST(
REGEXP_REPLACE(
LOWER(column),
"[:digit:]+[:space:]*ml",
"///"
) as UNSIGNED
) AS volume
...using the character_class
es defined in the manual.
Problem: The character_class
es seem not to work as intended (possibly an issue with the character class name for spaces?).
Third attempt: Using the common Regexp tokens
CAST(
REGEXP_REPLACE(
LOWER(column),
"\d+\s*ml",
"///"
) as UNSIGNED
) AS volume
...which seems to work better.
Problem: It also matches the 4th data example again.
Question: How do I perform some kind of positive lookahead in MySQL REGEXP_REPLACE that will only match data examples 1-3 (ignoring the case sensitiveness due to casting to lower)?
The one i've tried seem to work fine in PCRE (with global flag) but not in MySQL:
^.*\d+\s*ml.*$
(ignoring the missing capturing group)
Which is translated like: Match everything until 1 or more digits, followed by 0 or more whitespaces and the string ml
is found.