0

I have a table of products. These products have a product id. The product id can be shared by products. They are differentiated by the version id. For example bike has an id of 1. There are different kinds of bikes that share p_id of 1 but have v_id of 1-15. In the table there are many duplicates and i would like to find them. I made a query that finds the duplicates of the products. So i can see the count of versions of the product, but some of the products were inserted twice. I wrote a query to view the count of versions of the products, but now i would like to see all the duplicates. There is a column called product Value (the description) which is the indicator if its a duplicate.

Example

productKey  productValue    cout
16293   Bike    2
16292   Bike    2
16291   Bike    2
16290   Pads    2
16289   Pads    2
16288   Helmet  3
16286   Shoe    2

From here you can see bike and pads show up 3 times, but Helmet & Shoe only show up once. I would like to edit my duplicate query (or add to it), so that i can just get the items that show up more than once (aka Helmet and shoe would not appear ) The third column is the version, but that can be ignored.

Query

SELECT productKey, productValue, COUNT(*) as cout
FROM [Store].[dbo].[products]
GROUP BY productKey, productValue
HAVING COUNT(*) > 1
ORDER BY productKey DESC

Wanted result, something like this

productKey  productValue    cout
16293   Bike    2
16292   Bike    2
16291   Bike    2
16290   Pads    2
16289   Pads    2
AJ_
  • 3,787
  • 10
  • 47
  • 82
  • Post the expected result – Pரதீப் Sep 29 '17 at 14:16
  • cout would be the versions – AJ_ Sep 29 '17 at 14:19
  • The query you wrote seems correct... Wht is the problem?? – Rams Sep 29 '17 at 14:19
  • @Rams, it inculcates all the products. There will be duplicate keys no matter what because of versions. So the query i wrote finds them. It does nothing about the actual duplicates like 3 bikes. – AJ_ Sep 29 '17 at 14:23
  • you are trying to query for duplicates of product key and thier version and then get a count of these? – Danimal Sep 29 '17 at 14:28
  • This is still not clear. You consider 16293-Bike and 16292-Bike a duplicate. And as to product versions: you also consider 16293-v1-Bike and 16293-v2-Bike a duplicate? So you are merely looking for product names that occur more then once in the table, no matter which product keys or versions? Or are you saying that duplicate products within a product ID are not considered a problem, but duplicate products with multiple product IDs are? – Thorsten Kettner Sep 29 '17 at 15:09
  • @ThorstenKettner Looking for the names that appear more than once. That is based on my initial query that groups the different versions together. If i didn't group the versions, no matter what i would see double name for every product, because there are more than one version per product. – AJ_ Sep 29 '17 at 15:21
  • I've posted a new answer, now that I think I understand the task. This would have been very clear, by the way, had you shown some sample table data including products with one version only, some with more, some duplicates and some not. Your own query and decription were rather confusing than helpful, because of the lacking sample data. Anyway, your problem stems mainly from bad table design. There should be a product table (where you could easily find duplicate names; you could even forbid them with a database constraint) and a product version table. – Thorsten Kettner Sep 29 '17 at 15:35

6 Answers6

1

Use Count() Over() window aggregate function

Select * from 
(
select *,
       cout = Count(1)over(partition by productValue)
from [Store].[dbo].[products]
) a
Wher cnt > 1

If you want to use Group By then you need sub-query

Select * 
from  [Store].[dbo].[products] 
where productValue in (SELECT productValue 
                       FROM [Store].[dbo].[products] 
                       GROUP BY productValue HAVING COUNT(*) > 1) 
Pரதீப்
  • 91,748
  • 19
  • 131
  • 172
  • Yes, I think John is mistaken when he thinks that his query can be built upon to achieve what he wants. It seems he is really merely looking for duplicate product values, so you did well in ignoring his elaboration :-) – Thorsten Kettner Sep 29 '17 at 14:33
  • @ThorstenKettner - yep, thats why I asked him for expected result. All he want to do is, return all the records of a `productValue` if it is present more than once – Pரதீப் Sep 29 '17 at 14:36
1

You want to find products where two versions refer to the same thing. (At least this is what your query does.) And from these products that have duplicate versions you want to select only those products where the product title occurs in at least two problem products. That doesn't seem to make much sense, but here you go:

SELECT productKey, productValue, cout
FROM
(
  SELECT 
    productKey, 
    productValue, 
    COUNT(*) as cout,
    COUNT(*) OVER (PARTITION BY productValue) as cnt
  FROM Store.dbo.products
  GROUP BY productKey, productValue
  HAVING COUNT(*) > 1
) counted
WHERE cnt > 1
ORDER BY productKey DESC;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • This is really close, it clears out some of the no-duplicates, but still some remain (maybe something else is going on behind the scenes). I'll see if i cant edit this query to complete remove those duplicates. – AJ_ Sep 29 '17 at 15:20
0
You can select duplicate rows in subquery and select them in your main query



  SELECT 
         productKey, productValue,cout
    FROM 
         [Store].[dbo].[products] p1
    WHERE 
         productKey IN( 
               SELECT productKey
               FROM [Store].[dbo].[products]
               GROUP BY productKey, productValue
               HAVING COUNT(*) > 1
               )
0

Here you go:

DECLARE @T TABLE (productKey INT, productValue VARCHAR(30), cout INT);
INSERT INTO @T VALUES
(16293,   'Bike',  2),
(16292,   'Bike',  2),
(16291,   'Bike',  2),
(16290,   'Pads',  2),
(16289,   'Pads',  2),
(16288,   'Helmet',  3),
(16286,   'Shoe',  2);

SELECT *
FROM @T
WHERE  productValue IN (SELECT productValue FROM (SELECT productValue,COUNT(productValue) N FROM @T GROUP BY productValue) AS T WHERE T.N > 1) ;

Result:

+------------+--------------+------+
| productKey | productValue | cout |
+------------+--------------+------+
|      16293 | Bike         |    2 |
|      16292 | Bike         |    2 |
|      16291 | Bike         |    2 |
|      16290 | Pads         |    2 |
|      16289 | Pads         |    2 |
+------------+--------------+------+
Ilyes
  • 14,640
  • 4
  • 29
  • 55
0

What you really want (from what I understand now): Duplicate product names, but ignoring duplicates within a product number, because these are versions of the same product and hence of course not considered a problem.

So you are looking for product names where you count more than one product number. Use COUNT(DISTINCT ProductKey) for this.

select * 
from
(
  select products.*, count(distinct productkey) over (partition by productValue) as cnt
  from products
) counted
where cnt > 1;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • I get the following error, Incorrect syntax near 'distinct'. I know it says this is legal on the sql forms, but i cant get it to not give me this error. maybe it has something to do with the over – AJ_ Sep 29 '17 at 19:28
  • It turns out that `COUNT OVER` is very restricted in SQL Server and doesn't allow for `DISTINCT`. But there is an amazing solution for this. See here: https://stackoverflow.com/questions/11202878/partition-function-count-over-possible-using-distinct – Thorsten Kettner Sep 29 '17 at 20:45
0

Please try below SQL Query which could help you to achieve the result :

;WITH CTE
     AS (
     SELECT productKey,
            productValue,
            COUNT(*) OVER(PARTITION BY productValue) AS ActualCnt
     FROM products
     GROUP BY productKey,
              productValue)
     SELECT T.productKey,
            T.productValue,
            T.cout
     FROM CTE C
          INNER JOIN products T ON T.productValue = C.productValue
     WHERE C.ActualCnt > 1
     GROUP BY T.productKey,
              T.productValue,
              T.cout;

Desired Output :

productKey  productValue   cout
----------- ------------- -------
16289       Pads           2
16290       Pads           2
16291       Bike           2
16292       Bike           2
16293       Bike           2
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52