0

I'm having an issue where I am running a query and each time it produces different results. I'm working with relatively large sets of data (700,000+) and the differences are about 50-100.

I'm taking raw data from a staging table on a specific date (so the data is not changing) and then I'm loading that into a temp table.

Each time I check the count of the Temp table, it is the same so the issue is not there. However on the next part, I'm adding some grouping around dates and other specific details. And when I run this, it will have a different population each time it is run.

As an example, this is a very basic version of my script:

IF OBJECT_ID(N'tempdb..#Table1') IS NOT NULL
DROP TABLE [#Table1]
SELECT ORDER ID
,CITY
,DATE
,AMOUNT
,CURRENCY
INTO #TABLE1
FROM #TABLE2 --This is the data from the staging table

--GROUPING
IF OBJECT_ID(N'tempdb..#Table3') IS NOT NULL
DROP TABLE [#Table3]
SELECT CITY
,DATE
,ROUND(SUM(CONVERT(FLOAT, AMOUNT)),-1)
,CURRENCY
INTO #TABLE3
FROM #TABLE1
GROUP BY CITY,DATE,CURRENCY
HAVING ROUND(SUM(CONVERT(FLOAT, AMOUNT)),-1) = 0

Each time I run this, the record count for #Table3 changes. Any ideas what could be causing this? as mentioned, #Table1 shows the full data set consistently and the underlying source data has not changed.

Dale K
  • 25,246
  • 15
  • 42
  • 71
  • Your SQL is invalid. The `GROUPING` keyword can't be used like that. What does `\*` mean? (If it's a SQL comment then the syntax is `/* */`). Why are you backslash-escaping your table names? Also, is there a reason for using `#temporary` tables instead of (the much-easier-to-use, imo) table-variables? – Dai Mar 30 '22 at 21:03
  • Where is `#TABLE2` coming from? – Dai Mar 30 '22 at 21:05
  • Hi - First time posting and the \* appeared but I've edited. – Reyrey88887 Mar 30 '22 at 21:12
  • Table 2 is the staging table. The example I've used a very basic version of what I am doing but the reason for using various temp tables is because each temp table has different logic and they come together to be joined at the end. Not sure if that's the right way but just how its been done before and I've inherited this logic. – Reyrey88887 Mar 30 '22 at 21:13
  • 6
    Side-note: it looks like you're using `float` to represent money/currency values. [Don't do that](https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency). – Dai Mar 30 '22 at 21:17
  • What about `#TABLE4`? The reason you're seeing non-deterministic behaviour in your query is because your SQL script runs in a muddy environment. You need to ensure you have a consistent state every time the script runs - and using `#temporary` tables makes that harder because `#temporary` tables' lifespan is longer than the script in which they're created-in, which is why (imo) `#temporary` tables (and `tempdb` too) should be avoided and people should stick to _only_ table-variables (and table-parameters). Only use `#temporary` tables if you _absolutely_ have to (e.g. in a legacy `PROCEDURE`) – Dai Mar 30 '22 at 21:19
  • I see you changed `#table4` to `#table1`. Is there any reason why `#table1` can't be made into a normal table? – Dai Mar 30 '22 at 21:34
  • 1
    Change your HAVING ROUND(SUM(CONVERT(FLOAT, AMOUNT)),-1) = 0 to HAVING ROUND(SUM(CONVERT(DECIMAL(38,8), AMOUNT)),-1) = 0 and check again, I believe you are getting some floating point errors, which could always happen with the type float – Surendra Mar 30 '22 at 21:41
  • I don't think you're using `ROUND` correctly here: `HAVING ROUND(SUM(CONVERT(FLOAT, AMOUNT)),-1) = 0` - when you use `-1` it means the _left-hand-side_ will be rounded, which in this context I think gives you a meaningless value... – Dai Mar 30 '22 at 22:01
  • 2
    No one can help you. Your code does not filter rows in any way so it is impossible for your end result to vary so long as the input data is constant. You have already expressed that your actual logic is more complicated - so it seem you have a lot of debugging work ahead of you. I must agree with the "say no to temp tables" comment. Too many times I've seen this pattern of – SMor Mar 30 '22 at 22:34
  • @Dai thank you so much for the help. Looks like I do need to redo alot of the logic. – Reyrey88887 Mar 31 '22 at 09:33
  • @Surendra thank you vm for the suggestion. I did the change as you mentioned and its fixed the errors and working fine. Thanks again. – Reyrey88887 Mar 31 '22 at 09:34

1 Answers1

2

I recommend you change your script to use only table-variables. Like so:

  • I assume that you absolutely have to keep on using #table2 for source-data.

    • But please consider changing it to a normal (non-temporary, non-table-variable) table if it contains persisted or expensive-to-produce data.
  • Your query has HAVING ROUND( x, -1 ) = 0 - which I don't think is useful - nor particularly clear.

    • ROUND( x, -1 ) means "round to the nearest tenth-place (including all digits to the right, including digits to the right of the radix place).
      • e.g. ROUND( 12345.67, -1 ) is 120.00.
    • ROUND( x, -1 ) will only evaluate to 0 when -5 < x < 5 (i.e. -4.999999… <= x <= 4.999999…)
    • The same logical condition can be much-simply expressed using just comparison operators as HAVING x > -5 AND x < 5.
      • I assume you were trying to avoid having SUM() mentioned twice, as though you're concerned that HAVING SUM( Amount ) > -5 AND SUM( Amount ) < 5 would make SQL Server compute SUM twice.
        • SQL Server might be kinda dumb, but it isn't that dumb, I can assure you.
        • Another alternative is to use a CTE with the SUM as a named column in the initial SELECT and move the predicate to an outer WHERE clause instead of HAVING. This is the approach used in my answer below (look for WITH g AS).
  • My code uses double-quotes " to escape object-identifiers instead of [] square-brackets, this is because ISO/ANSI SQL uses " so it's the right thing to do, and makes your already-very-proprietary SQL code ever-so-slightly more portable.

  • I wrapped everything in a BEGIN TRY and BEGIN TRANSACTION - even though no normal tables are being mutated, because I assume that eventually you are going to make changes you'll want to persist, so I thought I'd scaffold that for you.



SET XACT_ABORT ON; <-- Everyone should be doing this: https://stackoverflow.com/questions/1150032/what-is-the-benefit-of-using-set-xact-abort-on-in-a-stored-procedure

BEGIN TRY;

    BEGIN TRANSACTION;

    -- As this script depends on the state of #table2, you should explicitly validate its contents before continuing:

    DECLARE @expectedTable2RowCount        int           = 1234;
    DECLARE @expectedTable2CurrencySum     decimal(19,2) = 1234567.89;

    DECLARE @actualTable2RowCount          int           = ( SELECT COUNT(*) FROM #table1 );
    DECLARE @actualTable1Table1CurrencySum decimal(19,2) = ( SELECT SUM( "Amount" ) FROM #table2 );

    IF @expectedTable2RowCount <> @actualTable2RowCount OR @expectedTable2CurrencySum <> @actualTable2Table1CurrencySum
    BEGIN
        THROW 50000, '#table1 contains unexpected data. Aborting script.', 0;
    END;

    ------------------

    DECLARE @myTable2Copy TABLE (
        "Order Id" int           NOT NULL,
        "City"     nvarchar(50)  NOT NULL,
        "Date"     date          NOT NULL,
        "Amount"   decimal(19,2) NOT NULL,
        "Currency" char(3)       NOT NULL,

        PRIMARY KEY ( "Order Id" )
    );

    INSERT INTO @myTable1Copy (
        "Order Id", "City", "Date", "Amount", "Currency"
    )
    SELECT
        "Order Id", "City", "Date", "Amount", "Currency"
    FROM
        #table2;

    ------------------

    DECLARE @myGroupedData TABLE (
        "City"      nvarchar(50)  NOT NULL,
        "Date"      date          NOT NULL,
        "Currency"  char(3)       NOT NULL,

        "SumAmount" decimal(19,2) NOT NULL,

        PRIMARY KEY ( "City", "Date", "Currency" )
    );

    WITH g AS (
        SELECT
            "City",
            "Date",
            "Currency",
            SUM( Amount ) AS SumAmount
        FROM
            @myTable2Copy
        GROUP BY
            "City",
            "Date",
            "Currency"
    )

    INSERT INTO @myGroupedData (
        "City", "Date", "Currency", SumAmount
    )
    SELECT
        "City", "Date", "Currency", SumAmount
    FROM
        g
    WHERE
        g.SumAmount > -5.00
        AND
        g.SumAmount <  5.00;

    ------------------

    COMMIT TRANSACTION;

END TRY;
BEGIN CATCH;

    IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION;
    THROW;

END CATCH;

Dai
  • 141,631
  • 28
  • 261
  • 374