7

How to use DISTINCT with JSON_ARRAYAGG?

Let's consider the below query as an example.

SELECT 
    staff.company,
    JSON_ARRAYAGG(
        JSON_OBJECT(
            'uuid', UuidFromBin(staff.uuid),
            'username', staff.username,
            'name', staff.name,
            'surname', staff.surname
        )
    )
FROM events_staff
JOIN staff ON staff.id = staff_id
LEFT JOIN skills s ON s.id = events_staff.skill_id
GROUP BY staff.company

Now, How can I use DISTINCT with JSON_ARRAYAGG in this query so that JSON objects will be distinct? It will be better if we can apply DISTINCT based on any key like uuid.

After googling for half an hour, I found the below options but was not able to apply these in the above query.

A JSON_ARRAYAGG DISTINCT returns a JSON array composed of all the different (unique) values for string-expr in the selected rows: JSON_ARRAYAGG(DISTINCT col1). The NULL string-expr is not included in the JSON array. JSON_ARRAYAGG(DISTINCT BY(col2) col1) returns a JSON array containing only those col1 field values in records where the col2 values are distinct (unique). Note however that the distinct col2 values may include a single NULL as a distinct value.

Radhe Shyam sharma
  • 900
  • 15
  • 21
  • Please post sample data. By the way, what's the purpose of doing a left join with `skills` table? I don't see it being used anywhere in the query. – FanoFN Feb 05 '22 at 05:34
  • I have removed other selects so that question will be dedicated to the actual query. – Radhe Shyam sharma Feb 05 '22 at 15:03
  • That's understandable but you still haven't post data sample. In any case, I think there's nothing wrong with `JSON_ARRAYAGG(DISTINCT col1)` **IF** the `col1` is actually a column that stores valid JSON value. Your `col1` is basically a generated JSON value from a few different columns and I'm guessing since you're grouping by `company`, your results returned duplicate `staff`. I suggest you apply the `DISTINCT` on the columns before you do `JSON_TABLE()` then only you do `JSON_ARRAYAGG()` - possibly without even including `DISTINCT`. – FanoFN Feb 07 '22 at 00:28

1 Answers1

12

I have came to a workaround to solve this issue, First addressing the issue that using JSON_ARRAYAGG(DISTINCT JSON_OBJECT()) Will simply not work.

So the workaround is CONCAT('[', GROUP_CONCAT(DISTINCT JSON_OBJECT("key": value)), ']'); this will result in something like this [ {"key": <value1>},{"key":<value2>}, ...]. this will return distinct result.

Note: You might need to cast this as JSON in the end this can be done like this => CAST(CONCAT('[', GROUP_CONCAT(JSON_OBJECT("key": value)), ']') AS JSON);

Ahmed Magdy
  • 1,054
  • 11
  • 16
  • 2
    This seemed to work for me. Not sure why MySQL doesn't support DISTINCT with JSON_ARRAYAGG. Seems like other DBs do. – Regis Sep 13 '22 at 15:43
  • Maybe in later versions this would be fixed until then this solves the problem – Ahmed Magdy Sep 13 '22 at 21:10
  • 1
    Let's hope MySQL implements this soon. https://bugs.mysql.com/bug.php?id=91993 MariaDB has DISTINCT in JSON_ARRAYAGG for at least 2 years already https://mariadb.com/kb/en/json_arrayagg/ – Tim Sep 23 '22 at 05:32
  • 5
    It's January 2023 and this is still necessary. Anyway, thank you very much for this. – Lucio Mollinedo Jan 06 '23 at 13:51