Group by like of a column value but may take each row to more than one group counter

Question

I need to count how many hash tags (or other text) appearances I have in a table. A hash tag placed in a cell (column), but a cell may contains more than one hash tag. A cell contains up to one hash tag of each type.

Example inspired from that question but need to get differ result.

Lets say I have a table that similar to that:

+----+-----------------------+-------------+--+
| id |       Items           | timestamp1  |  |
+----+-----------------------+-------------+--+
|  1 | #Car #Dog #Fish       |  01/01/2018 |  |
|  2 | #Dog                  |  25/01/2018 |  |
|  3 | #Fish #Dog            |  18/03/2019 |  |
|  4 | #Car #Dog #Fish       |  23/06/2019 |  |
|  5 | #Bird                 |  17/10/2019 |  |
+----+-----------------------+-------------+--+

And I need to count how many times each item (#Car, #Dog, etc..) appear.

I have tried this

SELECT (CASE WHEN items like '%#Dog%' THEN 'Dogs' 
             WHEN items like '%#Car%' THEN 'Cars'   
             WHEN items like '%#Fish%' THEN 'Fish'   
             WHEN items like '%#Bird%' THEN 'Birds'   
             END) as Item, count(*)
FROM  observations
GROUP BY (CASE WHEN items like '%#Dog%' THEN 'Dogs' 
               WHEN items like '%#Car%' THEN 'Cars'   
               WHEN items like '%#Fish%' THEN 'Fish'   
               WHEN items like '%#Bird%' THEN 'Birds'   
         END);

But this will give the (undesired) result:

+-------+----------+
| Item  | count(*) |
+-------+----------+
| Birds |        1 |
| Dogs  |        4 |
+-------+----------+

Desired results:

+-------+----------+
| count |   Item   |
+-------+----------+
|     4 | #Dog     |
|     2 | #Car     |
|     3 | #Fish    |
|     1 | #Bird    |
+-------+----------+

Is any good way to achieve that?

Fix your data model! Do not store multiple values in a string! Use an association/junction table. — Gordon Linoff, Dec 04 '19 at 12:15
@GordonLinoff that already exits. Actually the example is just for demo. The real is an SMS outbox table, and I need to count some keywords (not a big amount of data so prefer only SQL if possible). Obviously it is not a good method when there are many groups.. but my question is for small cases — Ron, Dec 04 '19 at 12:18
Can an item appear more than once under the same id? If it could, would you want to count it again under that id? — Strawberry, Dec 04 '19 at 12:33
@Strawberry Item can appear just one time for an id. I'll add that to question. Although it may be interesting to examine the case of one item appear more then once for an id in a cell and need to be counted as well. (but it not my case anyway ..) — Ron, Dec 04 '19 at 13:54

forpas · Answer 1 · 2019-12-04T14:00:52.580

3

If the items are known then use UNION ALL to get a query that returns them and then join to the table:

SELECT t.Item, COUNT(o.id) counter
FROM (
  SELECT '#Car' AS Item UNION ALL SELECT '#Dog' UNION ALL
  SELECT '#Fish' UNION ALL SELECT '#Bird'
) AS t LEFT JOIN observations o
ON CONCAT(' ', o.Items, ' ') LIKE CONCAT('% ', t.Item, ' %')
GROUP BY t.Item;

See the demo.
Results:

| Item  | counter |
| ----- | ------- |
| #Car  | 2       |
| #Dog  | 4       |
| #Fish | 3       |
| #Bird | 1       |

edited Dec 04 '19 at 14:00

answered Dec 04 '19 at 12:43

forpas

160,666
10
38
76

Thanks! Seems to be the desired results :) Can you explain please how does it works? I see you actually creating a 'virtual' table that contains the known items, that's OK. but how the join with the 'concat' and 'like' goes? – Ron Dec 04 '19 at 14:24
The condititon of the join is if an Item say `#Car #Dog #Fish` contains a value from the unioned query like `#Dog`. This is what the LIKE operator does. I add spaces at the beginning and end of `#Car #Dog #Fish` by using concat, to take into account any special case like the case there is only 1 value in the Item. – forpas Dec 04 '19 at 14:31

score 1 · Answer 2 · answered Dec 04 '19 at 12:38

I'm not seriously advocating this as a solution because the correct approach is to fix your data model. However, if this is a one-off problem, for a small data set...

I have a table of integers - ints(i) [0-9], but if using MySQL 8.0+, you can emulate this behaviour without the need for the utility table...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,items VARCHAR(120) NOT NULL
);

INSERT INTO my_table VALUES
(1,'#Car #Dog #Fish'),
(2,'#Dog'),
(3,'#Fish #Dog'),
(4,'#Car #Dog #Fish'),
(5,'#Bird');

SELECT n
     , COUNT(*) total 
  FROM 
     ( SELECT DISTINCT id
                     , TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(items,'#',i+1),'#',-1))n 
                  FROM my_table
                     , ints 
                 WHERE TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(items,'#',i+1),'#',-1)) <> ''
     ) a
 GROUP
    BY n;

+------+-------+
| n    | total |
+------+-------+
| Bird |     1 |
| Car  |     2 |
| Dog  |     4 |
| Fish |     3 |
+------+-------+

This assumes that a keyword only appears once per id or, at least, only wants to be counted once per id.

Group by like of a column value but may take each row to more than one group counter

2 Answers2