3

My Ubuntu Web development system was recently updated from 19.04 to 19.10 that included an update of MySQL, which is now version 8.0.18. The live server has version 5.6.45 (it is a shared server running CentOS so I cannot change it) and I discovered that something in REGEXP has apparently changed with the update.

The following query, when run locally, gives Illegal argument to a regular expression but runs properly on the live server. I am not too familiar with regular expressions much less REGEXP in MySQL so how can I make this work in the newer MySQL without breaking it in the old one?

These are the two REGEXP lines by themselves, followed by the complete query.

REGEXP '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]'

REGEXP '^[0-9]+(-[0-9]+)*$'

SELECT t.ID AS partID, t.partNo,
  TRIM(CONCAT(
    IF(N.n = 0, '', LEFT(t.Model, 2)),
    SUBSTRING_INDEX(SUBSTRING_INDEX(t.Model, '-', N.n + 1), '-', -1)
  )) AS modelNo, NULL AS bodyNo, t.xHD AS isRHD
FROM (
  SELECT
    SUBSTRING_INDEX(SUBSTRING_INDEX(t.part, ' ', N.n + 1), ' ', -1) AS Model,
    CASE LEFT(t.part, 3) WHEN 'LHD' THEN 1 WHEN 'RHD' THEN 2 ELSE 0 END AS xHD,
    t.ID, t.GroupNumber, t.partNo, t.Models
  FROM (
    SELECT
      LTRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(p.Models, ';', N.n + 1), ';', -1)) part,
      p.ID, p.GroupNumber, p.partNo, p.Models
    FROM parts_listing AS p CROSS JOIN parts_temp AS N
    WHERE Models REGEXP '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]' AND
          N.n <= LENGTH(Models) - LENGTH(REPLACE(Models, ';', ''))
  ) AS t CROSS JOIN parts_temp AS N
  WHERE N.n <= LENGTH(t.part) - LENGTH(REPLACE(t.part, ' ', ''))
) AS t CROSS JOIN parts_temp AS N
WHERE t.Model REGEXP '^[0-9]+(-[0-9]+)*$' AND
      N.n <= LENGTH(t.model) - LENGTH(REPLACE(t.Model, '-', '')); 
GMB
  • 216,147
  • 25
  • 84
  • 135
DonP
  • 725
  • 1
  • 8
  • 27
  • Please build a dbfiddle example, or show us what pattern the regexp should detect should detect. – nbk Dec 07 '19 at 22:34
  • I have no idea how to do that but, unless the server has v8.x of MySQL, it will not have the problem so not sure what the point would be. – DonP Dec 07 '19 at 22:38
  • Your regexpression have to search for some pattern, so So make some SELECT 'Testmeplease' REGEXP '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]' and the rsult is true. Somehow you must have teste the expressions – nbk Dec 07 '19 at 22:59
  • Does this answer your question? [Error Code: 3685. Illegal argument to a regular expression](https://stackoverflow.com/questions/59998409/error-code-3685-illegal-argument-to-a-regular-expression) – badams Aug 01 '22 at 21:23

1 Answers1

4

MySQL 8.0.4 introduced changed the implementation of its regexes engine from Henry Spencer's implementation to Internation Components for Unicode (ICU). This involves some non-backward compatible changes, which are listed in the documentation. Here is the part that is of interest for your use case:

The Spencer library supports word-beginning and word-end boundary markers ([[:<:]] and [[:>:]] notation). ICU does not. For ICU, you can use \b to match word boundaries; double the backslash because MySQL interprets it as the escape character within strings.

In other words, the following regexp is invalid in MySQL 8.0.4 onwards: '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]', because it contains Henry Spencer's boudary markers.

how can I make this work in the newer MySQL without breaking it in the old one?

Short answer: you can't. Changes are not backward compatible. You may be able to work around this situation by defining a custom expression (or character class) that represents word word boundaries for your use case. Here is an extremely simplified version, that handles a space or the beginning/end of the string:

'(^|\s)[0-9]+(-[0-9]+)*(\s|$)
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Backward compatibility isn't too important for me as the query is always run on my local system anyway but odd that something would be "fixed" by breaking it! For other users, that may be a deal-breaker for upgrading. In any event, if I understood you, I tried ***REGEXP '\b[0-9]+(-[0-9]+)*\b'*** and no longer get the error but I also get no results although your ***REGEXP '(^|\s)[0-9]+(-[0-9]+)*(\s|$)'*** example gives over 14,000 rows almost instantly so is much faster than before although I've not yet verified that they have the proper data. – DonP Dec 07 '19 at 22:56
  • I see that I failed to put double backslashes. This works ***WHERE Models REGEXP '\\b[0-9]+(-[0-9]+)*\\b'*** but not nearly as fast as your ***REGEXP '(^|\s)[0-REGEXP '(^|\s)[0-9]+(-[0-9]+)*(\s|$)'9]+(-[0-9]+)*(\s|$)'*** so I'll have to check the resulting data of both. I didn't time them but the first one has a noticeable lag while the other does not. – DonP Dec 07 '19 at 23:02
  • Followup: there is a difference as the first one gives nearly 15,000 rows while the second gives 67,280 which seems closer to what's needed. – DonP Dec 07 '19 at 23:05