What version of SQL are you using? If using SQL Server (T-SQL), one approach is to split your concatenated text field into pieces using a table-valued function. Something like this can get you started:
Edit: Oops! Newer SQL Server has the string_split()
function which you can similarly cross apply
, eliminating the need for tvfSplitToTableStrings()
. See the docs.
CREATE FUNCTION dbo.tvfSplitToTableStrings
(
@items NVARCHAR(max),
@delimiter CHAR(1)
)
RETURNS @itemTable table
(
[item] NVARCHAR(100)
, sequence INT
)
as
BEGIN
DECLARE
@tempItemList NVARCHAR(max),
@i int,
@item NVARCHAR(100)
SET @tempItemList = @items
SET @tempItemList = LTRIM(RTRIM(@items))
-- get index where our delimiter was found
SET @i = CHARINDEX(@delimiter, @tempItemList)
-- loop while all the characters in the list have not been traversed yet
declare @count int = 0
WHILE (LEN(@tempItemList) > 0)
BEGIN
IF @i = 0
-- if there are no delimiters, then this is the only item in our list
SET @item = @tempItemList
ELSE
-- get the first word (from the left) less the delimiter character
SET @item = LEFT(@tempItemList, @i - 1)
set @count = @count + 1
INSERT INTO @itemTable (item, sequence) VALUES (LTRIM(RTRIM(@item)), @count)
IF @i = 0
SET @tempItemList = ''
ELSE
-- remove the word we just added to the table
SET @tempItemList = RIGHT(@tempItemList, LEN(@tempItemList) - @i)
-- lather, rinse, repeat
SET @i = CHARINDEX(@delimiter, @tempItemList)
END
RETURN;
END
You can then use it with cross apply
as follows:
;with a as (
select 1 as id, 'dog,cat,fish' as animal
union
select 2 as id, 'snake,cat,racoon' as animal
)
select a.id, b.item, b.sequence
from a
cross apply app.tvfSplitToTableStrings(a.animal, ',') b
Since your strings seem to contain extra markup ["...", "..."]
, you can clean this up using TRANSLATE()
. Or you could modify the tvfSplitToTableStrings
function itself to parse and strip unneeded markup more intelligently. Assuming your string values do not contain any of the markup characters (including embedded spaces), the following should work:
;with a as (
select 1 as id, '["dog", "cat", "fish"]' as animal
union
select 2 as id, '["snake", "cat", "racoon"]' as animal
)
, b as (
select a.id
, TRIM(TRANSLATE(b.item, '[]" ', ' ')) as item
, b.sequence
from a
cross apply app.tvfSplitToTableStrings(a.animal, ',') b
)
select item, count(*) as [count]
from b
group by item
order by item;
If you're using MySQL, you'd probably want to take an approach like https://stackoverflow.com/a/17942691.
Also note that you may need to beware of performance. cross apply
is fine with smaller datasets but may not scale well.