0

Within my table, I have a column for tags which is a LIST of strings.

["conda"]
["intel"]
["pandas", "conda"]
["api", "partner"]
["dask", "distributed computing", "conda"]

I'd like to be able to get a count of each distinct string.

i.e.

-------------------
 tag   | count
-------------------
conda  | 3
pandas | 1
...
-------------------

Thus far, I've been doing it the difficult way and using this...

SELECT tags
FROM "public"."content"
WHERE concat(tags) LIKE '%{INSERT_TAG_NAME_HERE}%'

Thanks in advance!

user570104
  • 71
  • 2
  • 7

2 Answers2

0

What version of SQL are you using? If using SQL Server (T-SQL), one approach is to split your concatenated text field into pieces using a table-valued function. Something like this can get you started:

Edit: Oops! Newer SQL Server has the string_split() function which you can similarly cross apply, eliminating the need for tvfSplitToTableStrings(). See the docs.

CREATE FUNCTION dbo.tvfSplitToTableStrings
(
        @items NVARCHAR(max),
        @delimiter CHAR(1)
)
RETURNS @itemTable table
(
        [item]  NVARCHAR(100)
        , sequence INT
)
as
BEGIN
        DECLARE
                @tempItemList NVARCHAR(max),
                @i int,
                @item NVARCHAR(100)
        SET @tempItemList = @items
        SET @tempItemList = LTRIM(RTRIM(@items))
        -- get index where our delimiter was found
        SET @i = CHARINDEX(@delimiter, @tempItemList)
        -- loop while all the characters in the list have not been traversed yet
        declare @count int = 0
        WHILE (LEN(@tempItemList) > 0)
        BEGIN
            IF @i = 0
                -- if there are no delimiters, then this is the only item in our list
                SET @item = @tempItemList
            ELSE
                -- get the first word (from the left) less the delimiter character
                SET @item = LEFT(@tempItemList, @i - 1)
            set @count = @count + 1
            INSERT INTO @itemTable (item, sequence) VALUES (LTRIM(RTRIM(@item)), @count)
            IF @i = 0
                SET @tempItemList = ''
            ELSE
                -- remove the word we just added to the table
                SET @tempItemList = RIGHT(@tempItemList, LEN(@tempItemList) - @i)
            -- lather, rinse, repeat
            SET @i = CHARINDEX(@delimiter, @tempItemList)
        END 
        RETURN;
END

You can then use it with cross apply as follows:

;with a as (
    select 1 as id, 'dog,cat,fish' as animal
    union
    select 2 as id, 'snake,cat,racoon' as animal
)
select a.id, b.item, b.sequence
    from a
    cross apply app.tvfSplitToTableStrings(a.animal, ',') b

Since your strings seem to contain extra markup ["...", "..."], you can clean this up using TRANSLATE(). Or you could modify the tvfSplitToTableStrings function itself to parse and strip unneeded markup more intelligently. Assuming your string values do not contain any of the markup characters (including embedded spaces), the following should work:

;with a as (
    select 1 as id, '["dog", "cat", "fish"]' as animal
    union
    select 2 as id, '["snake", "cat", "racoon"]' as animal
)
, b as (
    select a.id
    , TRIM(TRANSLATE(b.item, '[]" ', '    ')) as item
    , b.sequence
    from a
    cross apply app.tvfSplitToTableStrings(a.animal, ',') b
)
select item, count(*) as [count] 
    from b
    group by item
    order by item;

If you're using MySQL, you'd probably want to take an approach like https://stackoverflow.com/a/17942691.

Also note that you may need to beware of performance. cross apply is fine with smaller datasets but may not scale well.

Martin_W
  • 1,582
  • 1
  • 19
  • 24
  • I'm actually unclear what version of SQL is being used because I'm trying to do this voo doo magic in Metabase. https://www.metabase.com/docs/latest/users-guide/writing-sql.html – user570104 Oct 11 '21 at 14:17
0

I understand this a bit better now. A note on Metabase, Metabase uses the services are using so in my case Postgres.

I believe the issue on some of my queries was around some null and empty values.

SELECT tag, count(*) AS cnt FROM (
    SELECT 
        content.id, content.tags, tag FROM content,
        jsonb_array_elements(
            case jsonb_typeof(content.tags::jsonb) 
                when 'array' then content.tags 
                else '[]' end
    ) j(tag)
) inn
GROUP BY tag
ORDER BY cnt DESC

Hope this helps others in the future.

user570104
  • 71
  • 2
  • 7