0

My code is such:

SELECT COUNT(*)
FROM earned_dollars a
LEFT JOIN product_reference b ON a.product_code = b.product_code
WHERE a.activity_year = '2015'

I'm trying to match two tables based on their product codes. I would expect the same number of results back from this as total records in table a (with a year of 2015). But for some reason I'm getting close to 3 million.

Table a has about 40,000,000 records and table b has 2000. When I run this statement without the join I get 2,500,000 results, so I would expect this even with the left join, but somehow I'm getting 300,000,000. Any ideas? I even refered to the diagram in this post.

Community
  • 1
  • 1
intA
  • 2,513
  • 12
  • 41
  • 66
  • what exactly are you attempting to count? – Kritner May 11 '15 at 17:58
  • More than one row per product_code in both tables? – dnoeth May 11 '15 at 17:58
  • My guess would be since the same product could be sold in multiple years you're getting those as well... consdier moving the where to the ON and limiting results to where b.Product_Code is not null. – xQbert May 11 '15 at 18:02
  • Is product_code unique in product_reference? – paparazzo May 11 '15 at 19:01
  • What your query is asking is "How many pairs of earned dollars and product references are there for 2015, including earned dollars with no product references." your problem is that that's not what you intended to ask, but it's not clear what you did intend. If you intended to ask "How many rows in earned_dollars have matching products," you need @XenoPuTtSs answer. If you meant something else, then please clarify. – Alex Weitzer May 11 '15 at 22:56

4 Answers4

4

it means either your left join is using only part of foreign key, which causes row multiplication, or there are simply duplicate rows in the joined table.

use COUNT(DISTINCT a.product_code)

Pavel Gatnar
  • 3,987
  • 2
  • 19
  • 29
0

What is the question are are trying to answer with the tsql?

instead of select count(*) try select a.product_code, b.product_code. That will show you which records match and which don't.

Should also add a where b.product_code is not null. That should exclude the records that don't match.

b is the parent table and a is the child table? try a right join instead.

SQLburn
  • 129
  • 5
  • This is really part of a bigger INSERT statement to bring product_name from my product_ref table into the earned_premium table based on matching the product_code . I've just broken it out into a SELECT here to demonstrate the issue I'm having. – intA May 11 '15 at 20:20
0

Or use the table's unique identifier, i.e.

SELECT COUNT(a.earned_dollars_id)

0

Not sure what your datamodel looks like and how it is structured, but i'm guessing you only care about earned_dollars?

SELECT COUNT(*)
FROM earned_dollars a
WHERE a.activity_year = '2015'
and exists (select 1 from product_reference b ON a.product_code = b.product_code)
XenoPuTtSs
  • 1,254
  • 1
  • 11
  • 31