I have the following table:
CREATE TABLE #TEMP (ColA VARCHAR(MAX), ColB VARCHAR(MAX), Date date, Value int)
INSERT INTO #TEMP VALUES('A','B','7/1/2010','11143274')
INSERT INTO #TEMP VALUES('A','B','7/1/2010','13303527')
INSERT INTO #TEMP VALUES('A','B','7/1/2010','17344238')
INSERT INTO #TEMP VALUES('A','B','7/1/2010','13236525')
INSERT INTO #TEMP VALUES('A','B','7/1/2010','10825232')
INSERT INTO #TEMP VALUES('A','B','7/1/2010','13567253')
INSERT INTO #TEMP VALUES('A','B','7/1/2010','10726342')
INSERT INTO #TEMP VALUES('A','B','7/1/2010','11605647')
INSERT INTO #TEMP VALUES('A','B','7/2/2010','13236525')
INSERT INTO #TEMP VALUES('A','B','7/2/2010','10825232')
INSERT INTO #TEMP VALUES('A','B','7/2/2010','13567253')
INSERT INTO #TEMP VALUES('A','B','7/2/2010','10726342')
INSERT INTO #TEMP VALUES('A','B','7/2/2010','11605647')
INSERT INTO #TEMP VALUES('A','B','7/2/2010','17344238')
INSERT INTO #TEMP VALUES('A','B','7/2/2010','17344238')
INSERT INTO #TEMP VALUES('A','B','7/2/2010','17344238')
SELECT * FROM #TEMP
DROP TABLE #TEMP
In R (a statistical software), to calculate the 95th percentile value of the last column, I am doing something like this:
ddply(data, c("ColA", "ColB", "Date"), summarize, Value95=quantile(Value, 0.95))
and the output is the following:
A B 2010-07-01 16022293
A B 2010-07-02 17344238
All this is doing is performing a GROUP BY
operation on ColA
, ColB
and Date
and applying an aggregate function quantile
function. So far so good but I should have a way to do this in SQL Server because this is an aggregate operation that can cleanly be done in SQL and when the data is in the order of millions, I really want to do this in SQL than a statistical software.
My problem is I am not able to find a good way to write the quantile function itself. I tried using NTILE but it does not make sense using NTILE(100)
when the number of rows under a particular GROUP BY
is less than 100. Is there a good way to do this?
UPDATE: Some more output from R
if it helps:
> quantile(c(1,2,3,4,5,5), 0.95)
95%
5
> quantile(c(1,2,3,4,5,5), 0.0)
0%
1
> quantile(c(1,2,3,4,5,5), 1.0)
100%
5
> quantile(c(1,2,3,4,5,5), 0.5) // MEDIAN
50%
3.5