I would like to use the group capture in a regular expression to parse a string. Can this be done with MySQL?
For example,
SELECT my_wishful_function('->hello world<-', '/->(.*)<-/')
and have it return:
hello world
I would like to use the group capture in a regular expression to parse a string. Can this be done with MySQL?
For example,
SELECT my_wishful_function('->hello world<-', '/->(.*)<-/')
and have it return:
hello world
MySQL regex engine does not support backreferences, unlike other RDBMS such as Oracle or Postgres.
You can use regexp_replace()
instead. That is, remove the unwanted parts instead of capturing the wanted part:
select regexp_replace('->hello world<-', '(^->)|(<-$)', '')
Regexp explanation: ^->
is the unwanted part at the beginning of the string, and <-$
is the unwanted part at the end of the string. We surround the parts with parentheses to define groups, and separate them with |
, which means or.
select regexp_replace('->hello world<-', '(^->)|(<-$)', '') new_col
| new_col | | :---------- | | hello world |
An elegant solution is to use REGEXP_SUBSTR()
and positive lookahead/behind assertions :
SELECT REGEXP_SUBSTR('->hello world<-', '(?<=->).*(?=<-)');
-- hello world
(?<=...)
is a positive lookbehind assertion : we check matching pattern is preceded by ...
(?=...)
is a positive lookahead assertion : we check matching pattern is followed by ...
These assertions are not capturing, so we don't need to delete delimiters ->
and <-
It's compatible with MySQL 8
In MySQL 5.7, if the boundaries ->
and <-
only occur once in the string, you can use SUBSTRING_INDEX
twice to extract the portion between the boundaries:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('I said ->hello world<- today', '->', -1), '<-', 1)
Output
hello world