2

I have a "changesets" table which has a comments column where people enter references to bug issues in the format "Fixed issue #2345 - ......", but can also be "Fixed issues #456, #2956, #12345 ...." what's the best way to select these reference numbers so i can access the issues via a join.

given this change sets table

id     comments
===========================
1      fixed issue #234 ....
2      DES - #789, #7895, #123  
3      closed ticket #129

i'd like results like this

changeset_id issue_id
=====================
1            234
2            789
2            7895
2            123
3            129

I've used substring_index(substring_index('#',-1),' ',1) type construct but that will only return a single reference per line.

Also looking for the most efficient way to do this text lookup

Any help appreciated

Thanks

d sharpe
  • 189
  • 2
  • 12
  • http://stackoverflow.com/questions/1096679/can-mysql-split-a-column gives hint on how to split a string. – Fluff Nov 03 '11 at 20:00
  • That's going to be a bit of a nasty problem to solve, I think. Does it *have* to be all in SQL? There does come a point where things are just easier to do in your application, such as get all comments with a '#', then parse out the numbers and do something with them. – Cylindric Nov 30 '11 at 14:39

1 Answers1

0

Here's one (bloated/messy) approach on how to get the desired dataset...

Step 1 - figure out what the maximum # of issue ids is

SELECT MAX(LENGTH(comments)- LENGTH(REPLACE(comments,'#',''))) AS max_issues 
FROM change_sets

Step 2 - recursively create a UNION'd query with a number of "levels" equal to the maximum number of issue ids. For your example,

SELECT changeset_id, issue_id  FROM
(
SELECT id AS changeset_id, CAST(SUBSTRING_INDEX(comments,'#',-1) AS UNSIGNED) AS issue_id FROM change_sets 
UNION
SELECT id AS changeset_id, CAST(SUBSTRING_INDEX(comments,'#',-2) AS UNSIGNED) AS issue_id FROM change_sets 
UNION
SELECT id AS changeset_id, CAST(SUBSTRING_INDEX(comments,'#',-3) AS UNSIGNED) AS issue_id FROM change_sets 
) a  
HAVING issue_id!=0 
ORDER BY changeset_id, issue_id

I'm taking advangage of UNION's ability to remove duplicate rows, and CAST's ability to use the leading numeric values when deciding the integer.

The result using your toy dataset:

+--------------+----------+
| changeset_id | issue_id |
+--------------+----------+
|            1 |      234 |
|            2 |      123 |
|            2 |      789 |
|            2 |     7895 |
|            3 |      129 |
+--------------+----------+
Riedsio
  • 9,758
  • 1
  • 24
  • 33