I'll show a function that is based on string splitting as shown in SQL split values to multiple rows to be used like
SELECT get_the_parts(theString) FROM example;
This function works as this:
- split the string into a table with parts using the double quote as delimiter
- select only those parts that match exactly 3 upper case letters or 5 upper case letters followed by three digits
- concatenates the selected parts with the comma as separator
DELIMITER //
CREATE FUNCTION get_the_parts(myString VARCHAR(2000)) RETURNS VARCHAR(2000)
BEGIN
DECLARE result VARCHAR(2000);
SELECT
GROUP_CONCAT(t.value) INTO result
FROM (
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(e.col, '"', n.n), '"', -1) value
FROM ( SELECT myString AS col ) e
CROSS JOIN (
-- creates a numbers table with the values from 1 to 1,000 on the fly
SELECT
1 + a.N + b.N * 10 + c.N * 100 AS n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
ORDER BY n
) n
WHERE
n.n <= 1 + LENGTH(myString) - LENGTH(REPLACE(myString, '"', ''))
) t
WHERE
t.value REGEXP '^([A-Z]{3}|[A-Z]{5}[0-9]{3})$';
return result;
END //
DELIMITER ;
Explanation
Creating a numbers table
The innermost subselect with the UNION ALL
creates on the fly a numbers table with the numbers from 1 to 1000. This subselect could easily substituted by a numbers table in your database.
Splitting the string
With a nested call of SUBSTRING_INDEX we cut the n-th substring between the separators. We use the double quote as separator:
SUBSTRING_INDEX(SUBSTRING_INDEX(e.col, '"', n.n), '"', -1)
The expression
1 + LENGTH(myString) - LENGTH(REPLACE(myString, '"', ''))
gives us the count of parts, because it's one occurrence more than occurrences of the separator.
Selecting the wanted parts
We use the the regexp
'^([A-Z]{3}|[A-Z]{5}[0-9]{3})$'
because our parts have got to exactly match the known regexp from the beginning ^
until the end $
without having more content.
Concatenating
Lastly we use GROUP_CONCATE with the default separator, the comma, to get the desired result.
Remarks
I've built a Demo.
You can easily modify this function to meet your needs.
Do you really prefer this? I would recommend to use your one-line javascript. Text processing of this form isn't really what relational databases are best.