1

Suppose I have a string of random characters with quoted strings inside like this:

a:15:i:0s:3:"FOO"i:1s:3:"BAR"i:2s:3:"BAZ"i:3s:3:"ALPHA100"i:4s:3:"ALPHA500"i:5s:3:"BRAVO250"i:6s:3

I'd like to match the quoted strings inside which always contain either three uppercase letters or five uppercase letters followed by three numbers. In JavaScript I can do this:

" the string ".match(/"([A-Z]{3}|[A-Z]{5}\d{3})"/g);

I've looked into the REGEXP operator for MySQL but that seems to only be applicable for conditional clauses.

Ideally I'd like to select all the strings, trim off the quotation marks, and do a GROUP_CONCAT to get back a final result set of one row like this:

"FOO,BAR,BAZ,ALPHA100,ALPHA500,BRAVO250"

Ideally I'd like to do this at the database level to avoid downloading all the strings and running them through my one-line JavaScript program.

rink.attendant.6
  • 44,500
  • 61
  • 101
  • 156

1 Answers1

0

I'll show a function that is based on string splitting as shown in SQL split values to multiple rows to be used like

SELECT get_the_parts(theString) FROM example;

This function works as this: - split the string into a table with parts using the double quote as delimiter - select only those parts that match exactly 3 upper case letters or 5 upper case letters followed by three digits - concatenates the selected parts with the comma as separator

DELIMITER //
CREATE FUNCTION get_the_parts(myString VARCHAR(2000)) RETURNS VARCHAR(2000)
BEGIN
  DECLARE result VARCHAR(2000);

  SELECT
    GROUP_CONCAT(t.value) INTO result
    FROM (
      SELECT
        SUBSTRING_INDEX(SUBSTRING_INDEX(e.col, '"', n.n), '"', -1) value
      FROM ( SELECT myString AS col ) e
      CROSS JOIN (
        -- creates a numbers table with the values from 1 to 1,000 on the fly
        SELECT 
            1 + a.N + b.N * 10 + c.N * 100 AS n
        FROM
            (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
            ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
            ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
        ORDER BY n    
      ) n
      WHERE
        n.n <= 1 + LENGTH(myString) - LENGTH(REPLACE(myString, '"', '')) 
    ) t
    WHERE 
        t.value REGEXP '^([A-Z]{3}|[A-Z]{5}[0-9]{3})$';

    return result;
END //
DELIMITER ;

Explanation

Creating a numbers table

The innermost subselect with the UNION ALL creates on the fly a numbers table with the numbers from 1 to 1000. This subselect could easily substituted by a numbers table in your database.

Splitting the string

With a nested call of SUBSTRING_INDEX we cut the n-th substring between the separators. We use the double quote as separator:

SUBSTRING_INDEX(SUBSTRING_INDEX(e.col, '"', n.n), '"', -1)

The expression

1 + LENGTH(myString) - LENGTH(REPLACE(myString, '"', ''))

gives us the count of parts, because it's one occurrence more than occurrences of the separator.

Selecting the wanted parts

We use the the regexp

'^([A-Z]{3}|[A-Z]{5}[0-9]{3})$'

because our parts have got to exactly match the known regexp from the beginning ^ until the end $ without having more content.

Concatenating

Lastly we use GROUP_CONCATE with the default separator, the comma, to get the desired result.

Remarks

I've built a Demo.
You can easily modify this function to meet your needs.
Do you really prefer this? I would recommend to use your one-line javascript. Text processing of this form isn't really what relational databases are best.

Community
  • 1
  • 1
VMai
  • 10,156
  • 9
  • 25
  • 34