0

I created a procedure in SQL that took a large dataset and output a large XML file. Due to memory issues I am trying to rejig it so that it runs for smaller subsets which I add together elsewhere. I added the named subquery, 'current' that filters the data down to a single IID entity then run the same code on that, but for some reason it takes a lot longer than I expected.

WITH 

    current AS (SELECT * FROM xml.MainDataset WHERE IID = '27'), 
    sources AS  (SELECT DISTINCT stage, IID, name FROM current ),
    pop AS  (SELECT DISTINCT stage, IID, pop, type, plat FROM current ),
    Singles AS  (SELECT DISTINCT stage, IID, pop, type, plat, stype, split1, split2  FROM current  Where split2 = 'N/A'),
    Measures AS     (SELECT DISTINCT stage, IID, pop, type, plat, stype, split1, split2, mType FROM current)


--SELECT stage, 

    SELECT IID as ukprn, name,

        (SELECT pop as pop_category, type, plat,

        -- SINGLES

            (SELECT stype as sCat, split1 as attribute,

                (SELECT mType,

                    (SELECT DISTINCT mDetail, stream1, stream2, stream3, stream4, stream5,

                    FROM current as da
                    WHERE da.stage  = mT.stage
                    AND da.IID      = mT.IID
                    AND da.pop      = mT.pop
                    AND da.type     = mT.type
                    AND da.plat     = mT.plat
                    AND da.stype    = mT.stype
                    AND da.split1   = mT.split1
                    AND da.split2   = mT.split2
                    AND da.mType    = mT.mType
                    FOR XML PATH ('data'), TYPE) [*]


                FROM Measures as mT
                WHERE mT.stage  = sST.stage
                AND mT.IID      = sST.IID
                AND mT.pop      = sST.pop
                AND mT.type     = sST.type
                AND mT.plat     = sST.plat
                AND mT.stype    = sST.stype
                AND mT.split1   = sST.split1
                --AND mT.split2 = sST.split2
                AND mT.split2 = 'N/A'
                FOR XML PATH ('measures'), TYPE) [*]


            FROM Singles as sST
            WHERE sST.stage = Po.stage
            AND sST.IID     = Po.IID
            AND sST.pop     = Po.pop
            AND sST.type    = Po.type
            AND sST.plat    = Po.plat
            FOR XML PATH ('singles'), TYPE) [*],



        FROM pop as Po
        WHERE Po.stage  = Pr.stage
        AND Po.IID      = '27'
        FOR XML PATH ('pop'), TYPE) [*]


    FROM Sources as Pr
    WHERE Pr.stage = 'Primary'
    FOR XML PATH ('source'),



root('ReleaseData')

The dataset (xml.MainDataset) is very big and has hundreds of unique IID values, however if I break the query down into two parts they are very fast: filtering the main dataset to a single IID takes a second. Running the query on a dataset that only contains one IID takes only a second.

What is happening here? IS the 'current' dataset being created multiples times?

Thom A
  • 88,727
  • 11
  • 45
  • 75
user2980115
  • 45
  • 2
  • 9
  • Try GROUP BY , see https://stackoverflow.com/questions/18132039/sql-server-two-level-group-by-with-xml-output – Serg May 31 '19 at 09:53
  • I've had a go but I'm not really sure what I should be Grouping here? Just the innermost query? – user2980115 May 31 '19 at 14:35
  • Yes, you need only most detailed query, `Measures` and a stack of `group by` . – Serg May 31 '19 at 16:59
  • Ok great, I'll try that on monday. – user2980115 May 31 '19 at 19:29
  • I grouped by mDetail, stream1-5 but it doesn't seem to have sped it up at all. The problem, I think, is that the inner select statement that I've grouped by are all unique anyway. This is essentially the data, and the preceding columns are defining the population with these data. – user2980115 Jun 03 '19 at 08:21

0 Answers0