It's possible to get the resultset, but I'd really only do this to convert this to another table, with one row per author. I wouldn't want to run queries like this from application code.
The SUBSTRING_INDEX
function can be used to extract the first, secpond, et al. author from the list, e.g.
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 1 ),',',-1) AS author1
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 2 ),',',-1) AS author2
SUBSTRING_INDEX(SUBSTRING_INDEX(authors,',', 3 ),',',-1) AS author3
But this gets messy at the end, because you get the last author when you retrieve beyond the length of the list.
So, you can either count the number of commas, with a rather ugly expression:
LENGTH(authors)-LENGTH(REPLACE(authors,',','')) AS count_commas
But it's just as easy to append a trailing comma, and then convert empty strings to NULL
So, replace authors with:
CONCAT(authors,',')
And then wrap that in TRIM and NULLIF functions.
NULLIF(TRIM( foo ),'')
Then, you can write a query that gets the first author from each row, another query that gets the second author from each row (identical to the first query, just change the '1' to a '2', the third author, etc. up to the maximum number of authors in a column value. Combine all those queries together with UNION operations (this will eliminate the duplicates for you.)
So, this query:
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',1),',',-1)),'') AS author
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',2),',',-1)),'')
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',3),',',-1)),'')
FROM unfortunately_designed_table a
UNION
SELECT NULLIF(TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(a.authors,','),',',4),',',-1)),'')
FROM unfortunately_designed_table a
this will return a resultset of unique author names (and undoubtedly a NULL). That's only getting the first four authors in the list, you'd need to extend that to get the fifth, sixth, etc.
You can get the maximum count of entries in that column by finding the maximum number of commas, and adding 1
SELECT MAX(LENGTH(a.authors)-LENGTH(REPLACE(a.authors,',','')))+1 AS max_count
FROM unfortunately_designed_table a
That lets you know how far you need to extend the query above to get all of the author values (at the particular point in time you run the query... nothing prevents someone from adding another author to the list within a column at a later time.
After all the work to get distinct author values on separate rows, you'd probably want to leave them in a list like that. It's easier to work with.
But, of course, it's also possible to convert that resultset back into a comma delimited list, though the size of the string returned is limited by max_allowed_packet
session variable (iirc).
To get it back as a single row, with a comma separated list, take that whole mess of a query from above, and wrap it in parens as an line view, give it an alias, and use the GROUP_CONCAT
function.
SELECT GROUP_CONCAT(d.author ORDER BY d.author) AS distinct_authors
FROM (
...
) d
WHERE d.author IS NOT NULL
If you think all of these expressions are ugly, and there should be an easier way to do this, unfortunately (aside from writing procedural code), there really isn't. The relational database is designed to handle information in tuples (rows), with each row representing one entity. Stuffing multiple entities or values into a single column goes against relational design. As such, SQL does not provide a simple way to extract values from a string into separate tuples, which is why the code to do this is so messy.