Starting from this table with a utf8mb4_unicode_ci
collation:
DROP TABLE IF EXISTS preferences;
CREATE TABLE IF NOT EXISTS preferences
(
id int auto_increment primary key,
user_id int not null,
title varchar(150) not null
) collate = utf8mb4_unicode_ci;
using these values:
INSERT INTO preferences (user_id, title) VALUES (1, 'Preference 1'), (1, 'Preference 2'), (1, 'Preference 3'), (1, 'Preference 4'), (1, 'Preference 5');
This query
SELECT CONCAT('"', GROUP_CONCAT(p.title SEPARATOR '","'), '"') AS title
FROM preferences AS p
GROUP BY p.user_id
with a group_concat_max_len=64
results in
title |
---|
"Preference 1","Preference 2","Preference 3","Preference 4","Pref" |
But when using that query in a derived table, like so:
SELECT preferences_grouped.title
FROM (
SELECT CONCAT('"', GROUP_CONCAT(p.title SEPARATOR '","'), '"') AS title
FROM preferences AS p
GROUP BY p.user_id
) as preferences_grouped
you'll get only 16 chars (64/4)
title |
---|
"Preference 1","Pr |
This is not happening when the concat
is removed or moved to the primary query:
SELECT CONCAT('"', preferences_grouped.title, '"')
FROM (
SELECT GROUP_CONCAT(p.title SEPARATOR '","') AS title
FROM preferences AS p
GROUP BY p.user_id
) as preferences_grouped;
CONCAT('"', preferences_grouped.title, '"') |
---|
"Preference 1","Preference 2","Preference 3","Preference 4","Pref" |
So it seems that adding the concat in combination with a derived table ànd a utf8mb4
charset causes the byte count for the chars in the result set to be quadrupled, meaning my group_concat_max_len
should really be quadrupled too.
- Why is that?
- Can I do something using
COLLATE
to fix this / change this behavior?
Note: When the preferences
table has a non-utf8mb4 charset, like the default latin1_swedish_ci
collation, then there's also no byte-quadrupling going on.
Check the fiddle here: https://www.db-fiddle.com/f/hQD6r3pE5pmgR3n2XA7VtX/5