-2

I am facing an issue with a moderately simple-looking SELECT SQL Query.

My requirement: For One specific company, if there are multiple records with the same date, I want only the first record of that date, not all from that date.

Given table:

User company joining_date
Devika Google 1/3/2021
Aparna Apple 12/9/2021
Suresh Google 10/2/2022
Rajesh Apple 12/9/2021
Arun Google 10/2/2022

Expected Output:

count users date company
2 Devika,Suresh 1/3/2021,10/2/2022 Google

My Output:

count users date company
3 Devika,Suresh, Arun 1/3/2021,10/2/2022 Google

My Approach:

select 
    count(user) as count, 
    group_concat(DISTINCT user) as users, 
    group_concat(DISTINCT date) as date
    company
from employee 
GROUP by company 
having company = "Google";

As per the requirement, I grouped the records by company name. I only want the results for Google, so I added a condition for that. Now, if I use the Distinct keyword, I will not get duplicate dates, but I will still have three counts of users instead of two, and three usernames instead of two. I want to skip "Arun" from this result.

How can I achieve this logic?

Vivek
  • 19
  • 7
  • 2
    What is your logic for keeping Suresh and not Arun, both have the same date. – Stu Jul 22 '23 at 09:58
  • An individual's name and count are not crucial in the result set as per the requirement. Consider it as a hypothetical scenario. I couldn't find a way to make it happen in this query. – Vivek Jul 22 '23 at 10:05

3 Answers3

1

Assuming you're using MySql 8+ the ubiquitous approach is to use row_number to produce the desired rows before you aggregate. The order by critereia determins which is the "first" row in each partition (in this case rows with the same date):

select count(user) as count, 
    group_concat(user) as users, 
    group_concat(joining_date) as date,
    company
  from (
    select *, Row_Number() over(partition by joining_date order by user desc) rn
    from t
    where company = 'Google'
)t
where rn = 1
group by company;

See this example Fiddle

Stu
  • 30,392
  • 6
  • 14
  • 33
  • Good answer. I never used Row_number(), over, partition by clauses before. It would be better if there was a simpler way with more traditional clauses. Thanks for the response. – Vivek Jul 22 '23 at 10:36
  • What do you mean by "simpler" and "more traditional"? This kind of queries is neither complicated nor not traditional. Have a look at "window functions", the first of them were introduced to Standard SQL about 20 years ago! It seems you are "behind the times" ;) Generally spoken, if you think such queries are complicated, you can avoid them and do such logic in your application instead. – Jonas Metzler Jul 22 '23 at 11:01
  • I fail to understand what you could mean by *simpler* - this is about as simple a solution as you could possibly get using standard SQL, you will find it used hundreds if not thousands of times for answers to similar questions. – Stu Jul 22 '23 at 11:35
  • @Stu I found a much easier way to achieve the desired result without using the Window function. can you review my answer and suggest to me if there is any flaw in it? – Vivek Jul 31 '23 at 17:47
  • Your solution is technically invalid SQL and only works in MySql because your MySql config is not using `only_full_group_by` mode which is not recommended, @Vivek. – Stu Jul 31 '23 at 17:49
  • @JonasMetzler You too, please review the new answer that I achieved without using the Window function. Suggest to me if there are any flaws. thanks... – Vivek Jul 31 '23 at 17:52
  • @Stu I see. config would be an issue. – Vivek Jul 31 '23 at 17:58
1

Although, Stu's answer is correct. After some trial and error, I found a much easier way to achieve the desired result where the window function is not required.

My New Approach:

select 
    count(tmp.id) as count, 
    group_concat(tmp.user) as users, 
    group_concat(tmp.joining_date) as date, 
    tmp.company as company 
from 
    (
        select * from employee 
        where company = "Google" 
        group by joining_date
    ) as tmp 
group by tmp.company;

As per this approach, we can use subquery first where we apply conditions for the company and group it by the joining date to ignore the second record with duplicate joining date. Then into the main query, we group by with the company to get the desired result.

Note: To make this work, the default config for ONLY_FULL_GROUP_BY has to be disabled.

Vivek
  • 19
  • 7
  • Why should this be "much easier"? It is NOT recommended to avoid window functions. They were introduced (20 years ago!) to use them, not to prevent them. – Jonas Metzler Jul 31 '23 at 19:11
  • This Query provides better performance than Query with Window function. This query's execution time was 0.015 sec while with the Window function, it was 0.156 sec on the same dataset. However, as Stu mentioned in his response, this query is not compatible with the default config of only_full_group_by for the MySQL database, @JonasMetzler – Vivek Aug 01 '23 at 04:56
  • This issue: "this query is not compatible with the default config of only_full_group_by for the MySQL database" is very critical and just means the query is totally incorrect and shouldn't be shown as an answer. People with a similar issue might copy that query and encounter lots of issues. – Jonas Metzler Aug 01 '23 at 05:00
  • No, it doesn't mean the query is incorrect. It just means that it is not compatible with the "default" config of MySQL. If we change the MySQL config (that we already did in our projects previously) it will work, although not recommended. It all depends if this group by setting truly affects your database logic or not. In a nutshell, it is subjective to the project requirement, @JonasMetzler. – Vivek Aug 01 '23 at 05:16
  • Please don't misuse comments for opinionated discussions like "I asked a question because I was unable to solve a SQL problem, but I am able to know everything better than the experts here." Your query is technically INVALID (it seems you don't understand how bad this is!), no matter what strange config of MYSQL might allow it anyway. If another config allows to execute queries without using FROM, those queries are also incorrect. No one should write incorrect SQL queries or even show them as an answer. – Jonas Metzler Aug 01 '23 at 05:20
-2

You can utilize a subquery or CTE to achieve the filtering and then perform the GROUP_CONCAT separately.

SELECT
    COUNT(user) AS count,
    GROUP_CONCAT(user) AS users,
    GROUP_CONCAT(joining_date) AS date,
    company
FROM (
    -- Subquery starts here
    SELECT
        user,
        joining_date,
        company,
        ROW_NUMBER() OVER (PARTITION BY company, joining_date ORDER BY user) AS row_num
    FROM employee
    WHERE company = "Google"
    -- Subquery ends here
) AS subquery
WHERE row_num = 1
GROUP BY company;
  1. The subquery selects the necessary columns from the employee table and adds a row_num column using the ROW_NUMBER() window function. The PARTITION BY clause ensures that the numbering is done separately for each company and joining date.
  2. The WHERE clause of the subquery filters the results to only include records for the company "Google".
  3. The outer query then filters the results from the subquery to only include rows with row_num equal to 1. This effectively selects the first record for each company and joining date combination.
  4. Finally, the outer query uses GROUP_CONCAT to concatenate the user names and joining dates for the selected records, and the results are grouped by the company.

This approach brings the strengths of SQL's window functions, filtering capabilities, and grouping functionalities to efficiently produce the desired output (specific requirement of skipping duplicate dates for the specified company).

Rachmat
  • 76
  • 4
  • How have you used subquery here? I don't see one. – Vivek Jul 22 '23 at 10:27
  • I find your response confusing. Indeed, the duplicate dates are removed as per my query. Can you put it differently? – Vivek Jul 22 '23 at 10:30
  • Hey I apologize, I gave the wrong snippet code. The previous code is another query that I tested locally, but it failed. Sorry again – Rachmat Jul 22 '23 at 10:44
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – hossein Jul 27 '23 at 08:38
  • Sorry for being late, I already edit the answer and give some additional info – Rachmat Aug 15 '23 at 03:08