0

The column textfield has comma-seperated list values

ID  | textfield
1   |  english,russian,german
2   |  german,french
3   |  english
4   |  null

I'm attempting to count the amount of languages in textfield. The default language is "English", so if null then "English". The correct amount of languages is 4(english,russian,german,french).

Here is my query to attempt doing this:

SELECT SUM((length(`textfield`) - length(replace(`textfield`, ',', '')) + 1)) as my
FROM yourtable;

The result i get is 6, i don't know how to group the languages.

Here is fiddle http://sqlfiddle.com/#!9/0e532/1

The desired result is 4. How do i solve?

user892134
  • 3,078
  • 16
  • 62
  • 128
  • This is a classic example of why the answer to [Is storing a delimited list in a database column really that bad?](https://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) is yes. – Nick Jul 02 '20 at 04:14
  • Which version of MySQL are you using? – Nick Jul 02 '20 at 04:45
  • @Nick using mysql version 8 – user892134 Jul 02 '20 at 04:51

2 Answers2

1

For version 5.6 (like in the fiddle)

SELECT COUNT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(languages.textfield, ',', numbers.num), ',', -1)) languages_count
FROM (SELECT COALESCE(textfield, 'english') textfield
      FROM yourtable) languages
JOIN (SELECT 1 num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) numbers
    ON numbers.num <= LENGTH(languages.textfield) - LENGTH(REPLACE(languages.textfield, ',', '')) + 1;

fiddle

For version 8.x (as claimed in a comment)

SELECT COUNT(DISTINCT jsontable.value) languages_count
FROM yourtable
CROSS JOIN JSON_TABLE( CONCAT('["', REPLACE(COALESCE(textfield, 'english'), ',', '","'), '"]'),
                       "$[*]" COLUMNS( value VARCHAR(254) PATH "$" )
                     ) AS jsontable;

fiddle

Akina
  • 39,301
  • 5
  • 14
  • 25
1

Identifying the source of error

What your query is doing is counting how many languages in each row, and adding them all together. Your query does not take into account duplicates. Since English shows up twice in the table, it is counted twice (and German, too), hence in your example six. Also, another issue is that your current code considers null as what null truly means, the absence of a value. For example, if your database was

ID | textfield
---|----------
1  | null

you would also be arriving at incorrect results (more on this below).

Solution

This gets you a comma separated result of the languages, no duplicates.

SELECT
    GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(textfield, ',', n.digit+1), ',', -1)) textfield
  FROM
    yourtable
    INNER JOIN
    (SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) n
    ON LENGTH(REPLACE(textfield, ',', '')) <= LENGTH(textfield)-n.digit;

This query can serve as a subquery for what you were attempting to do in the question prompt. In other words, instead of the length('textfield') ... you would provide the resulting column name from this query

Null as in English

This logic should not be implemented at the database level, IMHO. If you want to go ahead and consider null entries as English, that is fine. The downside is the example I provided for you before. When you have a query that solves for the total languages in the database, if English wasn't an explicitly stated language and instead just a null value, then the query wouldn't 'count' English (it's null). But you can't just add 1 every time you find the amount of languages because English might already be explicit.

Recommendations:

Ari
  • 156
  • 5
  • Thanks, can you explain this part of the query `SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6)` ? – user892134 Jul 02 '20 at 05:34
  • @user892134 https://stackoverflow.com/questions/44546968/mysql-select-distinct-comma-delimited-values – Ari Jul 02 '20 at 13:42