1

I am trying to do the following but I cannot manage to get it right yet :(.

I have these tables:

table1 -> tb1_id, tb1_name 

Sample Data:
--------------
1  group1
2  group2
3  group3
4  group4
5  group5

table2 -> tb2_id, tb2_sector, tb2_tb3_id   

Sample Data:
--------------
1  alpha   1
2  beta    2
3  gamma   2
4  delta   2
5  epsilon 4

table3 -> tb3_id, tb3_mid, tb3_section 

Sample Data:
--------------
1   234   alpha,beta,gama,delta

This is the output that I am looking for:

Name    Count     %
------  -----    -----
group1   1       25%
group2   3       75%
group3   0       0%
group4   0       0%
group5   0       0%

Basically I need a split a column value delimited by a comma (tb3_section in table3) and then find the right group for each value (table2 gives me the group id to link with table1) and then do a total count by group and get the percentage (assuming total is 100%).

This is the query I tried so far:

I searched for split value samples and found one that does the split by creating a numbers table first:

create table numbers (
  `n` INT(11) SIGNED 
  , PRIMARY KEY(`n`)
) 

INSERT INTO numbers(n) SELECT @row := @row + 1 FROM 
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t,
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t2, 
(SELECT 0 UNION ALL SELECT 1) t8, 
(SELECT @row:=0) ti;

Afterwards, I did this:

select tb3_section, count(1) from ( 
  select 
    tb3_mid, 
    substring_index(
      substring_index(tb3_section, ',', n), 
      ',', 
      -1
    ) as tb3_section  from table3 
  join numbers
    on char_length(tb3_section) 
      - char_length(replace(tb3_section, ',', '')) 
      >= n - 1
) tb3_section_dashboard
group by 1

This doesn't give me the group count. Just does the split of tb3_section but doesn't give me the correct count and equivalent percentage. Any ideas will be much appreciate it thanks a lot.

LATEST UPDATE

First of all, I would like to thanks @eggyal for pointing me to the right direction and @Shadow for despise knowing that I was not taking the best approach, he came up with a quick fix to my problem. I managed to change the approach and removed the comma delimited values from table3. Instead now I add multiple rows for each new value (and added a constraint to avoid duplicates).

Now table3 looks like:

Sample Data:
--------------
1   234   alpha
2   234   beta
3   234   gama
4   234   delta
5   235   alpha

Here is the query I have taken from @shadow sample:

SELECT t1.tb1_name, COUNT(t3.tb3_section) AS no_per_group,
COUNT(t3.tb3_section) / t4.no_of_groups AS percentage 
FROM t1 left
JOIN t2 ON t1.tb1_id=t2.tb2_tb3_id 
INNER JOIN t3 ON t2.tb2_sector=t3.tb3_section>0 
JOIN (SELECT COUNT(*) AS no_of_groups
       FROM t3 INNER JOIN t2 ON t2.tb2_sector=t3.tb3_section>0) t4 
GROUP BY t1.tb1_name

Instead of using find_in_set now I use = to match the exact value. Now I get something like the following but the percentage looks odd and I miss a group that doesn't have a match:

Name        no_per_group     percentage
-----       -------------    ----------
group1      2                0.1053
group3      3                0.1579
group4      3                0.1579
group5      3                0.1579

Although still I need something like:

Name    Count     %
------  -----    -----
group1   1       25%
group2   3       75%
group3   0       0%
group4   0       0%
group5   0       0%

Notice that if there is no match in a group, I still need to show that group. Because I have thousands of records which are different from each other, I need to add another condition: where tb3_mid=234 . Likes this, the results are using to tb3_mid.

carol1287
  • 397
  • 4
  • 17
  • 2
    You may be interested to read [@Bill Karwin](https://stackoverflow.com/users/20860/bill-karwin)'s answer to [Is storing a delimited list in a database column really that bad?](http://stackoverflow.com/a/3653574) – eggyal Aug 23 '16 at 14:27
  • Thank you so much @eggyal for pointing to bill's great answer. In my case, the problem is that I do not have full control of the actual tables as I did not create them :(. However, I can experience now the problems with storing delimited values within one column. – carol1287 Aug 23 '16 at 14:37
  • You can construct a join predicate from MySQL's [`FIND_IN_SET()`](https://dev.mysql.com/doc/en/string-functions.html#function_find-in-set) function. – eggyal Aug 23 '16 at 14:43

1 Answers1

2

The best solution would be to redesign your table structure and move the data in the delimited values list to a separate table.

The quick solution is to utilise MySQL's find_in_set() function.

To get the total count of entries in the messages table (table3):

select count(*) as no_of_groups
from t3 inner join t2 on find_in_set(t2.tb2_sector,t3.tb3_section)>0

To get the counts per group, add a join to table1 and group by group name. To calculate the percentage, add the above query as a subquery:

select t1.tb1_name, count(t3.tb3_section) as no_per_group, count(t3.tb3_section) / t4.no_of_groups as percentage
from t1 left join t2 on t1.tb1_id=t2.tb2_tb3_id
inner join t3 on find_in_set(t2.tb2_sector,t3.tb3_section)>0
join (select count(*) as no_of_groups
      from t3 inner join t2 on find_in_set(t2.tb2_sector,t3.tb3_section)>0) t4  --no join condition makes a Cartesian join
group by t1.tb1_name
Shadow
  • 33,525
  • 10
  • 51
  • 64
  • Thanks a lot Shadow for spending the time to help me out. Your query solved the original problem. I managed to convince my colleague to change the table structure I took yours and @eggyal' recommendation and modified the structure. I will update my question now with what I did. Now I get nearly the results I want with the exception that when there is no match, I do not get the 0% but I think must be a left join missing somewhere. – carol1287 Aug 24 '16 at 02:16