0

I am using MySQL for my Shiny App. I am using this query for App and it is running perfectly okay.

Select
    concat(monthname(date_of_test), '-', year(date_of_test)) as 'Time',
    product_group AS 'ProductGroup',
    Pass,
    Case
        when pass='N' then @no:=count(distinct serial_number)
        when pass='Y' then count(distinct serial_number)-@no
    end as Count
from test_data 
where 
    year(date_of_test)=2018 
    and product_group='BHO'
    and month(date_of_test) between 3 and 4
group by
    product_group,
    month(date_of_test),
    pass

But I need to change it in MS SQL Server. I have tried with declaring as variable and use it as in SQL Server.

My try in SQL Server:

declare @no int;
set @no = 0;
Select
    CONCAT(datename(MM, date_of_test), '-', DATENAME(YY,date_of_test)) as 'Time',
    product_group AS 'ProductGroup',
    Pass,
    case
        when pass ='N' then  @no = count(distinct serial_number)    
        when pass ='Y' then count(distinct serial_number)- @no  
    end as 'Count'
from test_data 
where
    year(date_of_test)=2018 
    and product_group='BHO'
    and month(date_of_test) between 3 and 5
group by
    product_group,
    CONCAT(datename(MM, date_of_test),
    '-',
    DATENAME(YY,date_of_test)),
    pass    

The query without the variable is like:

 Select
    CONCAT(datename(MM, date_of_test), '-', DATENAME(YY,date_of_test)) as 'Time',
    product_group AS 'ProductGroup',
    Pass,
    case
        when pass ='N' then count(distinct serial_number)    
        when pass ='Y' then count(distinct serial_number)
    end as 'Count'
from test_data 
where 
    year(date_of_test)=2018 and product_group='BHO'
    and month(date_of_test) between 3 and 4
group by
    product_group,
    CONCAT(datename(MM, date_of_test),
    '-',
    DATENAME(YY,date_of_test)),
    pass 

and it is producing the following output:

enter image description here

The desired output was like which is from MySQL. Please take a look where Pass=Y then the value of Pass=N subtracted from it.

enter image description here

It is showing an error.

My initial assumption: in MySQL I can initialize variable in query and can use it within it,but in MS SQL Server may be there is other rules. My syntax or process can be wrong.

Select Count(distinct serial_number) from Test_Data where year(date_of_test)=2018 and product_group='BHO'and month(date_of_test)=4

503

Select Count(distinct serial_number) from Test_Data where year(date_of_test)=2018 and product_group='BHO' and PASS='Y' and month(date_of_test)=4

503

Select Count(distinct serial_number) from Test_Data where year(date_of_test)=2018 and product_group='BHO' and PASS='N'and month(date_of_test)=4

71

SO all 503 product(serial number) gone for multiple test and get Pass=Y value but 71 product have gone through the same test where they have failed in some case where it is noted as Pass=N. So if I can calculate the (distinct serial_number with PASS=y)-(distinct serial_number with PASS=N) then it will give number of products who pass all the tests.

I can do this and the result is:

Select CONCAT(datename(MM, date_of_test),'-',DATENAME(YY,date_of_test)) as 'Time',product_group AS 'ProductGroup',
                    (Count(Distinct case when PASS='Y' then serial_number end)-Count(Distinct case when PASS='N' then serial_number end)) 
                 as ' All Test Passed',
                 Count(Distinct case when PASS='N' then serial_number end) as 'Min 1 Test Failed'
               from test_data 
               where 
               year(date_of_test)=2018 
               and 
               month(date_of_test) between 3 and 4
               and product_group='BHO'
               group by product_group,CONCAT(datename(MM, date_of_test),'-',DATENAME(YY,date_of_test))

And the result is

enter image description here

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Subhasish1315
  • 938
  • 1
  • 10
  • 25
  • "It is showing error .can anybody help me on this?" Place the error here? – Raymond Nijland Sep 27 '18 at 10:11
  • The error is --->Msg 102, Level 15, State 1, Line 5 Incorrect syntax near '='. – Subhasish1315 Sep 27 '18 at 10:14
  • Never mind mine last comment the error is on a other line didn't saw the comment update.. looks like this part is wrong `@no = count(distinct serial_number)` – Raymond Nijland Sep 27 '18 at 10:19
  • @RaymondNijland thanks for pointing. but is there any other process to do this operation? – Subhasish1315 Sep 27 '18 at 10:22
  • 2
    Hard to suggest something without example data and expected results.. Read https://stackoverflow.com/help/how-to-ask section "Help others reproduce the problem" only SQL queries isn't enough. – Raymond Nijland Sep 27 '18 at 10:39
  • MSSQL does not allow you to use variables like that (assign a variable and get the row data at the same time). And the main use case for using variables like that in MySQL is to make up for the lack of window functions (e.g. to simulate a [`row_number()`](https://stackoverflow.com/a/1895127/6248528)). And you seem to do the same here. You'll have to rewrite the code, probably using window functions (but there are other options, see the variaty of answers in the linked question). Without seeing sample data and result, it's hard(er than necessary) to tell you how to rewrite it exactly. – Solarflare Sep 27 '18 at 10:49
  • @Subhasish1315 what is this query supposed to do? Why are you trying to set and *use* that `@no` variable? MySQL allows a lot of quirky syntax or even outright bugs to work, only to break at the next point update, or the moment MySQL tries to parallelize a query – Panagiotis Kanavos Sep 27 '18 at 10:49
  • @Subhasish1315 Using a variable like that for example may be an attempt to emulate the windowing functions that were introduced in MySQL 8 and are already available in SQL Server. It doesn't work well either - it requires that results are returned in a very specific order, which will break if eg the server uses parallel processing – Panagiotis Kanavos Sep 27 '18 at 10:50
  • @Subhasish1315 so you are trying to calculate the *difference* between the Y N group but return the N and difference counts in the same column? Why? You'd get an invalid Y count this way – Panagiotis Kanavos Sep 27 '18 at 10:55
  • @PanagiotisKanavos I am trying to count the distinct serial_number only with Pass= Y and Pass=N but one serial_number can have Pass=N or Pass=Y both with multiple records.....Need to check with serial number with all Pass=Y and other at least One Pass=N.. – Subhasish1315 Sep 27 '18 at 10:59
  • @Subhasish1315 even in MySQL the result is wrong then. If there were no duplicates you should return 71 and 503 for April. Your query though returns 71 and 432 whether there are duplicates or not – Panagiotis Kanavos Sep 27 '18 at 11:03
  • @PanagiotisKanavos from above 1st pic : For April 2018 PASS=N count=71 and PASS=Y count=503... where in 2nd Pic For April 2018 PASS=N count=71 and PASS=Y count=432 which is 503-71=432...thats I am trying to achieve. I can do it other way where it is producing 2 columns but I need it in single column for preparing graph – Subhasish1315 Sep 27 '18 at 11:08
  • @Subhasish1315 I did see the pictures. I'm saying that those values don't make sense though. You get the N count *including* any possible duplicates but a difference for Y whether there are duplicates or not. – Panagiotis Kanavos Sep 27 '18 at 11:10
  • @PanagiotisKanavos,,,I have just checked the data now as I have done previously with Excel Pivot and by Manually counting – Subhasish1315 Sep 27 '18 at 11:10
  • @Subhasish1315 checked *what*? You haven't explained what logic you used to get those counts. Excel Pivot *won't* do that. What if there were 503 Y answers without N duplicates? You'd still get 432 instead of 503. You'd have to use a self-join to find `serial_number` entries that have both Y and N answers – Panagiotis Kanavos Sep 27 '18 at 11:12
  • @PanagiotisKanavos can you check the edits in question. – Subhasish1315 Sep 27 '18 at 11:25
  • @Subhasish1315 in other words, given *one* pass count you want to find the *other* pass count. With MySQL 8's windowing and analytic functions you'd use `LEAD()` or `LAG()`. The use of this variable is a *hack* that tries to emulate `LAG()`. It's also one that will fail if MySQL returns the results in any different order, as it has any right to if there's no `ORDER BY` – Panagiotis Kanavos Sep 27 '18 at 11:30
  • @PanagiotisKanavos Query in MySQL is perfectly working. Looking for alternatives in MS SQL Server. – Subhasish1315 Sep 27 '18 at 11:33
  • @Subhasish1315 only by chance since there's no ORDER BY. That specific pattern has been replaced by LAG in MySQL which works always – Panagiotis Kanavos Sep 27 '18 at 11:39
  • @PanagiotisKanavos can you please check the edits in questions. I can get the results in 2 columns. but I need in Single column using PASS – Subhasish1315 Sep 27 '18 at 11:39
  • @Subhasish1315 I have, which is why I asked all those questions. If you search SO you'll find a lot of questions from people who found that using variables or other undocumented behaviour would break unexpectedly or lead to *big* performance degradation after updates. – Panagiotis Kanavos Sep 27 '18 at 11:52

1 Answers1

0

Looks like the MySQL query is trying to emulate the LEAD() or LAG() analytic functions introduced in MySQL 8. Those were already available in SQL Server since 2012 (I think).

The MySQL query assumes results will be returned in a certain order, even though there's no ORDER BY clause. It also assumes there's no parallel processing, at least when the variable is processed.

The entire CASE can be rewritten as :

count(distinct serial_number) - 
LAG(count(distinct serial_number),1,0) OVER (
                        PARTITION BY product_group,month 
                        ORDER BY pass)

This partions the GROUP BY results by product_group,month and then orders them by pass. LAG then returns the previous count in that partition, or 0 if there's no previous row. This means that LAG() will return 0 for N and N's count for Y

The complete query would look like this :

select 
    year(date_of_test),
    month(date_of_test),
    product_group,
    pass,
    count(distinct serial_number) - 
    LAG( COUNT(distinct serial_number),1,0) 
         OVER ( PARTITION BY product_group,month(date_of_test) 
                ORDER BY pass)
from test_data
where 
    year(date_of_test)=2018 
    and month(date_of_test) between 3 and 4
    and product_group='BHO'
group by 
    year(date_of_test),
    month(date_of_test),
    product_group,
    pass

A similar query could work with MySQL 8.

Performance and the query itself can be improved a lot by using a Calendar table. A calendar table is a table pre-populated with eg 20 years of dates which contains extra fields like month, month name, week number, work or holiday etc. This makes writing date-based queries a lot easier and the resulting queries a lot faster.

Assuming there was a calendar table with only a few basic fields like date, year, month, mont_name, one could simplify the query to this :

select 
    calendar.month_name + '-' + calendar.year,
    product_group,
    pass,
    count(distinct serial_number) - 
    LAG( COUNT(distinct serial_number),1,0) 
         OVER ( PARTITION BY product_group,calendar.month 
                ORDER BY pass)
from 
    test_data
    inner join calendar on date_of_test=date
where 
    calendar.year =2018 
    and calendar.month between 3 and 4
    and product_group='BHO'
group by 
    calendar.year,
    calendar.month,
    product_group,
    pass

This query can take advantage of indexes on the date_of_test, calendar.date, calendar.year and calendar.month columns to find results.

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • Thanks for the Answer. From your past comments I came to know about introduction about LAG,LEAD function in SQL Server 2012. I started working on that as per your guidance. Using LAG I was able to print the Pass=N value but not the calculation.Thanks for clearing the path. Also Thanks for the Calendar table guidance. – Subhasish1315 Sep 27 '18 at 12:38