I created a procedure in SQL that took a large dataset and output a large XML file. Due to memory issues I am trying to rejig it so that it runs for smaller subsets which I add together elsewhere. I added the named subquery, 'current' that filters the data down to a single IID entity then run the same code on that, but for some reason it takes a lot longer than I expected.
WITH
current AS (SELECT * FROM xml.MainDataset WHERE IID = '27'),
sources AS (SELECT DISTINCT stage, IID, name FROM current ),
pop AS (SELECT DISTINCT stage, IID, pop, type, plat FROM current ),
Singles AS (SELECT DISTINCT stage, IID, pop, type, plat, stype, split1, split2 FROM current Where split2 = 'N/A'),
Measures AS (SELECT DISTINCT stage, IID, pop, type, plat, stype, split1, split2, mType FROM current)
--SELECT stage,
SELECT IID as ukprn, name,
(SELECT pop as pop_category, type, plat,
-- SINGLES
(SELECT stype as sCat, split1 as attribute,
(SELECT mType,
(SELECT DISTINCT mDetail, stream1, stream2, stream3, stream4, stream5,
FROM current as da
WHERE da.stage = mT.stage
AND da.IID = mT.IID
AND da.pop = mT.pop
AND da.type = mT.type
AND da.plat = mT.plat
AND da.stype = mT.stype
AND da.split1 = mT.split1
AND da.split2 = mT.split2
AND da.mType = mT.mType
FOR XML PATH ('data'), TYPE) [*]
FROM Measures as mT
WHERE mT.stage = sST.stage
AND mT.IID = sST.IID
AND mT.pop = sST.pop
AND mT.type = sST.type
AND mT.plat = sST.plat
AND mT.stype = sST.stype
AND mT.split1 = sST.split1
--AND mT.split2 = sST.split2
AND mT.split2 = 'N/A'
FOR XML PATH ('measures'), TYPE) [*]
FROM Singles as sST
WHERE sST.stage = Po.stage
AND sST.IID = Po.IID
AND sST.pop = Po.pop
AND sST.type = Po.type
AND sST.plat = Po.plat
FOR XML PATH ('singles'), TYPE) [*],
FROM pop as Po
WHERE Po.stage = Pr.stage
AND Po.IID = '27'
FOR XML PATH ('pop'), TYPE) [*]
FROM Sources as Pr
WHERE Pr.stage = 'Primary'
FOR XML PATH ('source'),
root('ReleaseData')
The dataset (xml.MainDataset) is very big and has hundreds of unique IID values, however if I break the query down into two parts they are very fast: filtering the main dataset to a single IID takes a second. Running the query on a dataset that only contains one IID takes only a second.
What is happening here? IS the 'current' dataset being created multiples times?