3

I would like to use the group capture in a regular expression to parse a string. Can this be done with MySQL?

For example,

SELECT my_wishful_function('->hello world<-', '/->(.*)<-/')

and have it return:

hello world
GMB
  • 216,147
  • 25
  • 84
  • 135
Tom Rossi
  • 11,604
  • 5
  • 65
  • 96

3 Answers3

2

MySQL regex engine does not support backreferences, unlike other RDBMS such as Oracle or Postgres.

You can use regexp_replace() instead. That is, remove the unwanted parts instead of capturing the wanted part:

select regexp_replace('->hello world<-', '(^->)|(<-$)', '')

Regexp explanation: ^-> is the unwanted part at the beginning of the string, and <-$ is the unwanted part at the end of the string. We surround the parts with parentheses to define groups, and separate them with |, which means or.

Demo on DB Fiddle:

select regexp_replace('->hello world<-', '(^->)|(<-$)', '') new_col
| new_col     |
| :---------- |
| hello world |
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
GMB
  • 216,147
  • 25
  • 84
  • 135
  • 2
    Just to extend support for REGEXP was introduced with MySQL 8. [How to do a regular expression replace in MySQL?](https://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql/49925597#49925597) – Lukasz Szozda Dec 20 '19 at 21:46
  • 1
    Any solution for MySQL 5.7? – Tom Rossi Dec 21 '19 at 00:36
2

An elegant solution is to use REGEXP_SUBSTR() and positive lookahead/behind assertions :

SELECT REGEXP_SUBSTR('->hello world<-', '(?<=->).*(?=<-)');
-- hello world

(?<=...) is a positive lookbehind assertion : we check matching pattern is preceded by ...

(?=...) is a positive lookahead assertion : we check matching pattern is followed by ...

These assertions are not capturing, so we don't need to delete delimiters -> and <-

It's compatible with MySQL 8

Shadoweb
  • 5,812
  • 1
  • 42
  • 55
JCH77
  • 1,125
  • 13
  • 13
1

In MySQL 5.7, if the boundaries -> and <- only occur once in the string, you can use SUBSTRING_INDEX twice to extract the portion between the boundaries:

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('I said ->hello world<- today', '->', -1), '<-', 1)

Output

hello world
Nick
  • 138,499
  • 22
  • 57
  • 95