2

We just have few built-in functions in SQL Server to do sophisticated statistical analysis but I need to calculate multiple linear regression in TSQL.

Based on this post (Multiple Linear Regression function in SQL Server), I could be able to get Coefficients for Intercept (Y), X1 and X2.

What I need is p-value for X1 and X2 and also R Square

Test data:

DECLARE @TestData TABLE (i INT IDENTITY(1, 1), X1 FLOAT, X2 FLOAT, y FLOAT)

INSERT @TestData
    SELECT 0, 17, 210872.3034 UNION ALL
    SELECT 0, 23, 191988.2299 UNION ALL
    SELECT 0, 18, 204564.9455 UNION ALL
    SELECT 0, 4, 189528.9212 UNION ALL
    SELECT 0, 0, 200203.6364 UNION ALL
    SELECT 11, 0, 218814.1701 UNION ALL
    SELECT 5, 0, 220109.2129 UNION ALL
    SELECT 2, 0, 214377.8534 UNION ALL
    SELECT 1, 0, 204926.9208 UNION ALL
    SELECT 0, 0, 202499.4065 UNION ALL
    SELECT 0, 3, 196917.8182 UNION ALL
    SELECT 0, 9, 202286.0012

Desired output:

R Square    0.4991599183412360
p-value X1  0.0264247876580807
p-value X2  0.7817597643898020

I have already been able to get following data from the above test data.

b               Coefficients
----------------------------------
Intercept (Y)   202119.231151577
X1 C(H)         1992.8421941724
X2 C(C)         -83.8561622730127 

I know TSQL is not a good platform to obtain this but I need it to be done purely in TSQL.

I am aware of XLeratorDB Function Packages for SQL Server

Community
  • 1
  • 1
sqluser
  • 5,502
  • 7
  • 36
  • 50
  • Why don't you simply use the method described in the post you are basing this on and write yourself a CLR wrapper for the desired Math.NET functionality? It's pretty straight forward how to do this. – Staeff May 04 '15 at 07:53
  • Thanks for your reply. I need to calculate them in TSQL. The situation I have does not allow me to use Math.NET library or any other third party app. – sqluser May 04 '15 at 09:51

1 Answers1

0

You could calculate R Squared by hand and create a variable 'R2' equal to (Nxysum - xsumysum)^2/ (Nx2sum - xsumxsum) (Ny2sum - ysumysum)?

Where xsum and ysum are the sum of your values and N is the number of observations.

The formula for R Squared is simple enough that you don't necessarily need any function or statistical software. Check out this link for calculating it by hand: http://sciencefair.math.iit.edu/analysis/linereg/hand/

You can apply the same logic to T-SQL.

knawhatimean
  • 159
  • 8