0

Our database has over 100k products in it and we feed to our clients every 10 mins a report with prices and many other data. I am currently trying to re-write the queries to streamline the whole process. At this point I should point out my SQL knowledge is self taught over the last few months so just a novice really.

The part I am really banging my head against a wall is:

The following WHERE statement returns results event when there have been no changes and I cant work out want is causing it, or what I am doing wrong.

UPDATE variant_external_analysis
SET vaea_d_1 = GETDATE(), -- last updated
    vaea_n_3 = ROUND((vaea_n_12 + vaea_n_18) * (SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20'),2) -- product price
FROM variant_external_analysis
JOIN variant_detail ON vad_id = vaea_vad_id
WHERE 
    ((vaea_n_3 <> ROUND((vaea_n_12 + vaea_n_18) * (SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20'),2)) -- has price change
      OR vaea_n_3 IS NULL) -- Is price field empty
    AND vad_variant_code BETWEEN 1000 AND 990000 -- product range for client

This should update the price vaea_n_3 if it has changed or IS NULL and at the same time change the last updated date vaea_d_1 under the same conditions.

However for some reason even when I run the query multiple times in a row it still returns about 6k products even though vaea_n_3 has not changed for any product and none have a NULL value

I need vaea_d_1 so the price file we send to our clients can just include products where the price has changed rather than all 100k products.

Can anyone please what is wrong with this simple looking query?


SOLVED THANK YOU

From a combination of a few different peoples answers it proved that the problem was with the

(SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20')

So I have re-written the query and changed where/how I store my global variables:

UPDATE variant_external_analysis
SET vaea_d_1 = GETDATE(), -- last updated
    vaea_n_3 = ROUND((vaea_n_12 + vaea_n_18) * jna_n_7, 2) -- product price
FROM job_number_analysis, variant_external_analysis
JOIN variant_detail ON vad_id = vaea_vad_id
WHERE ((vaea_n_3 <> ROUND((vaea_n_12 + vaea_n_16) * jna_n_7, 2)) -- has price changed
    OR vaea_n_3 IS NULL) -- Is price field empty
    AND vad_variant_code BETWEEN 1000 AND 990000 -- product range for client
    AND jna_jn_id = 59

This now works correctly and only updates when something has changed. Thank you to everyone of put ideas forward.

Dan
  • 5
  • 4
  • What does `select vaea_n_3, ROUND((vaea_n_12 + vaea_n_18) * (SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20'),2) from where ` return? – GSerg Nov 07 '18 at 18:51
  • Please post an example of data that you expect to be updated that is not. Or vice versa. – Tab Alleman Nov 07 '18 at 18:57
  • As mentioned in the question if it is run twice then the second time it should NOT update anything, as nothing has changed between the 2 times it is run. However it is still updating about 6k products. For some reason the WHERE clause is not working on those products, I have done all the checks I can think of and cant find the cause. – Dan Nov 07 '18 at 19:07
  • 2
    comparing floats is notoriously tricky. https://stackoverflow.com/questions/16149966/comparing-2-float-values-in-sqlserver maybe try another option. – Jeremy Nov 07 '18 at 19:18
  • just for the comparison portion, multiply by 100, cast to int, then compare the integers. this will ensure that the comparison is accurate to 2 decimal places like you want. – Jeremy Nov 07 '18 at 20:06
  • Tip: The best practice is to assign a _meaningful_ alias to each table being joined, e.g. `... FROM variant_external_analysis as vea ...`, and to use the aliases on all column references. We can't tell which table supplies which values and it won't be obvious when someone looks at this code in production a week/month/year/decade from now. – HABO Nov 07 '18 at 21:24

2 Answers2

0

Not sure what your issue is but here is a way to make your query much faster:

SET vaea_d_1 = GETDATE(), -- last updated
      vaea_n_3 = ROUND((vaea_n_12 + vaea_n_18) * (SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20'),2) -- product price

FROM variant_external_analysis
    JOIN variant_detail ON vad_id = vaea_vad_id

WHERE ((vaea_n_3 <> ROUND((vaea_n_12 + vaea_n_18) * (SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20'),2)) -- has price change
      OR vaea_n_3 IS NULL) -- Is price field empty
      AND vad_variant_code BETWEEN 1000 AND 990000 -- product range for client

is the same as

SET vaea_d_1 = GETDATE(), -- last updated
      vaea_n_3 = ROUND((vaea_n_12 + vaea_n_18) * CAST(jn.jn_our_ref AS FLOAT),2) -- product price

FROM variant_external_analysis
    JOIN variant_detail ON vad_id = vaea_vad_id
LEFT JOIN job_number AS JN ON jn_id='20' 

WHERE ((vaea_n_3 <> ROUND((vaea_n_12 + vaea_n_18) * CAST(jn.jn_our_ref AS FLOAT),2) -- has price change
      OR vaea_n_3 IS NULL) -- Is price field empty
      AND vad_variant_code BETWEEN 1000 AND 990000 -- product range for client

this means you won't make 2 additional queries for each row.

Hogan
  • 69,564
  • 10
  • 76
  • 117
0

One possibility is ANY of these three items is NULL:

  • vaea_n_12
  • vaea_n_18
  • This subquery:

    SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20'
    

If even just one of those is NULL, the result of the entire ROUND() function must also be NULL, and therefore the vaea_n_3 <> ROUND(...) comparison will be FALSE. This is true even when vaea_n_3 is NOT NULL, which now fails both sides of the OR condition.

To help solve this, I suggest writing it as a SELECT query first:

SELECT vaea_n_3, vaea_n_12, vaea_n_18,
    (SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20') as SubQuery,
    ROUND((vaea_n_12 + vaea_n_18) * (SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20'),2) AS Rounded,
    CASE WHEN vaea_n_3 <> ROUND((vaea_n_12 + vaea_n_18) * (SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20'),2) THEN 1 ELSE 0 END As Different

FROM variant_external_analysis
JOIN variant_detail ON vad_id = vaea_vad_id
WHERE vad_variant_code BETWEEN 1000 AND 990000 AND
    ( vaea_n_3 IS NULL OR
      vaea_n_3 <> ROUND((vaea_n_12 + vaea_n_18) * 
                        (SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20'),2)
    )

Then you can look for rows where the results are not what you expect and you'll have the data right there to show you why.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20' is a global constant so is never NULL – Dan Nov 07 '18 at 19:33
  • Also, a subquery that has to re-run per row is not the same as a constant. It may produce the same result, but it can also sometimes need to re-evaluate that many times to be sure, and in some isolation levels it's possible to have the data change over the course of a query. – Joel Coehoorn Nov 07 '18 at 19:42
  • SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20' is a global constant so is never NULL. Neither vaea_n_12 nor vaea_n_18 have any NULL values anywhere in the table – Dan Nov 07 '18 at 19:43
  • and did you run the select query? – Joel Coehoorn Nov 07 '18 at 19:44
  • SELECT CAST(jn_our_ref AS FLOAT) FROM job_number WHERE jn_id='20' is just a value of 1.02 We change that figure fairly regularly and it is referred to in lots of queries. – Dan Nov 07 '18 at 19:51
  • Better to use the join suggested in other answer, or select it into a variable once before the query and reference it as a variable. It's bad to use it as a sub query like this, both for performance and because there really are some situations that can let this change in the middle of your query running. – Joel Coehoorn Nov 07 '18 at 19:52