Unable to get Linear Regression Cofficients in R after Successfully finding Q & R via Householder

Question

I am manually trying to calculate the regression coefficients rather than using any default for data http://people.sc.fsu.edu/~jburkardt/datasets/regression/x31.txt

Here is my code, which properly produces Q&R satisfying A=QR. But I am not able to find coefficients as dimensions of Q & R creating issues. Can any experts help me here? When I have proper Q & R how finding coefficients can go wrong?

library(xlsx)
data.df<-read.xlsx("regression.xlsx",2,header = F)
#Remove unneccesary Index position
data.df<-data.df[2:5]

#Decomposing
#coefficients [b]=inv(t(X)(Matrix Multiplication)(X))(Matrix Multiplication)t(X)(Matrix Multiplication)y
Y<-as.matrix(data.df[4])
#But note that if you need to express Y=Xb, the coefficient of b_0 must be X_0 
#which is 1
X_0<-as.data.frame(c(1,1,1,1,1,1,1,1,1,1))
X<-(cbind(X_0,data.df[1:3]))
names(X)<-c("X1","X2","X3","X4")
X<-as.matrix(X)
#Create copy for final evaluvations
A<-X  


x1<-as.matrix(X[,1])
deter_x<-sqrt(sum(x1^2))
n=dim(x1)[1]
deter_e1<-as.matrix(c(deter_x,rep(0,n-1)))
v1=x1+deter_e1
#c_i=2/(transpose(v_i)%*%(v_i))
c1<-as.numeric(2/(t(v1)%*%v1))
#H_i = I - c_i*v_i%*%transpose(v_i)
I<-diag(n)
H1<-I-c1*(v1%*%t(v1))
R1<-H1%*%X
#Check R1 and see if it is Upper Triangle Matrix
R1
#We will take rest of the interesting portion of matrix R1.
n=dim(R1)[1]
X<-as.matrix(as.data.frame(cbind(R1[2:n,2],R1[2:n,3],R1[2:n,4])))
x1<-as.matrix(X[,1])
deter_x<-sqrt(sum(x1^2))
n=dim(x1)[1]
deter_e1<-as.matrix(c(deter_x,rep(0,n-1)))
v1=x1-deter_e1
#c_i=2/(transpose(v_i)%*%(v_i))
c1<-as.numeric(2/(t(v1)%*%v1))
#H_i = I - c_i*v_i%*%transpose(v_i)
I<-diag(n)
H2<-I-c1*(v1%*%t(v1))
R2<-H2%*%X

#Check R2 and see if it is Upper Triangle Matrix, if no go for R3
n=dim(R2)[1]
X<-as.matrix(as.data.frame(cbind(R2[2:n,2],R2[2:n,3])))
x1<-as.matrix(X[,1])
deter_x<-sqrt(sum(x1^2))
n=dim(x1)[1]
deter_e1<-as.matrix(c(deter_x,rep(0,n-1)))
v1=x1+deter_e1
#c_i=2/(transpose(v_i)%*%(v_i))
c1<-as.numeric(2/(t(v1)%*%v1))
#H_i = I - c_i*v_i%*%transpose(v_i)
I<-diag(n)
H3<-I-c1*(v1%*%t(v1))
R3<-H3%*%X
R3

#Check R3 and see if it is Upper Triangle Matrix, if no go for R4
n=dim(R3)[1]
X<-as.matrix(as.data.frame(cbind(R3[2:n,2])))
x1<-as.matrix(X[,1])
deter_x<-sqrt(sum(x1^2))
n=dim(x1)[1]
deter_e1<-as.matrix(c(deter_x,rep(0,n-1)))
v1=x1-deter_e1
#c_i=2/(transpose(v_i)%*%(v_i))
c1<-as.numeric(2/(t(v1)%*%v1))
#H_i = I - c_i*v_i%*%transpose(v_i)
I<-diag(n)
H4<-I-c1*(v1%*%t(v1))
R4<-H4%*%X
R4
#As we can see R4 has all values except first element as zero
#Let us replace Matrices iteratively in R1 from R2 to R4 and round it of
R1[2:10,2:4]<-R2
R1[3:10,3:4]<-R3
R1[4:10,4]<-R4
R<-round(R1,5)
R
#Find Complete H1
#Q=H1%*%H2%*%H3%*%H4
H1_COM<-H1
# 
H_temp<-diag(10)
n=dim(H_temp)[1]
dim(H_temp[2:n,2:n])
dim(H2)
H_temp[2:n,2:n]<-H2
H2_COM<-H_temp
H2_COM
H_temp<-diag(10)
n=dim(H_temp)[1]
dim(H_temp[3:n,3:n])
dim(H3)
H_temp[3:n,3:n]<-H3
H3_COM<-H_temp
H3_COM

H_temp<-diag(10)
n=dim(H_temp)[1]
dim(H_temp[4:n,4:n])
dim(H4)
H_temp[4:n,4:n]<-H4
H4_COM<-H_temp
Q=H1_COM%*%H2_COM%*%H3_COM%*%H4_COM
# The following code properly reconstructs A Matrix proving proper Q & R
A=round(Q%*%R)
# When you try to find coefficients using Q&R you will end up in error.
solve(R)%*%t(Q)%*%Y
#Error in solve.default(R) : 'a' (10 x 4) must be square
#So trying to get matrix R without all 0 rows R[1:4,1:4]
solve(R[1:4,1:4])%*%t(Q)%*%Y
#Error in solve(R[1:4, 1:4]) %*% t(Q) : non-conformable arguments
dim(solve(R[1:4,1:4]))
dim(solve(R[1:4,1:4]))
#4 4
dim(t(Q))
#[1] 10 10
dim(Y)
#10  1

score 2 · Accepted Answer · edited May 23 '17 at 12:09

2

I would like to point you to this thread which I have answered (rather comprehensively): Multiple regression analysis in R using QR decomposition. Whether your question will be seen as a duplicate will be judged by the community. Note, among the options there, the last one is using a QR factorization written in plain R code myself.

Given that your toy code for QR factorization is correct (as you commented along your code), the major issue is with your last few lines.

The solution is simple:

solve(R) %*% (t(Q) %*% Y)[1:4,]

1:4 selects the first 4 elements in the single-column matrix t(Q) %*% Y.

If you check with my linked answer, you will see that instead of solve, I am using backsolve as this is a triangular system of equations.

You will also spot that I use crossprod when I can instead of t and %*%. My recent answer When is 'crossprod' preferred to '%*%', and when isn't? has a comprehensive discussion on these two fashions.

edited May 23 '17 at 12:09

Community

1
1

answered Mar 16 '17 at 08:16

Zheyuan Li

71,365
17
180
248

1

WoW! That was fast. Thanks. Just to modify, your answer is right provided I change my R to square matrix of 4X4 for given example: solve(R[1:4,1:4]) %*% (t(Q) %*% Y)[1:4,]. Only one thing I want to understand is, how mathematically it is right to select first 4 elements in the single-column matrix from t(Q) %*% Y, giving right results? I am definitely going to read your other answers, but if you can help me to give answer whenever you are free, that is a great help! Thanks again! Cheers! – Hari Prasad Mar 16 '17 at 08:40
1

You know what! You are amazing! Thanks for all the help. I have marked it as answer and upvoted it. I think key takeaway from your answer is "It shows how the orthogonal transform to Y is used. ". Thanks again. :-) – Hari Prasad Mar 16 '17 at 09:16
Hi Zheyuan Li, I apologize for my lack of knowledge. Still I am not able to understand how taking only top 4 of the singular column matrix is right? I read all literature that you referred. But how leaving rest of the values will give me right results, I am not able to understand :-( – Hari Prasad Mar 20 '17 at 09:30

Unable to get Linear Regression Cofficients in R after Successfully finding Q & R via Householder

1 Answers1