Hi I want to do a comprehensive analysis of regression techniques and so will go on editing this question. I am trying to solve a regression problem using techniques available in Matlab. Ideally I would like to look at techniques such as
- Linear Regression
- Logistic Regression
- Bayesian Regression
- Support Vector Regression
- Gaussian Process for Regression
Problem Statement
Given the data X
and Y
of size 333x128
and 333x1
where 333
is the number of training examples and 128
is the feature dimensions. The problem I am solving is an regression one and not a classification one. I intend to do all of the above in Matlab.
Linear Regression
The code for Linear Regression is given as follows : It takes the input data from the "hald" dataset and takes the first 10 elements for training purposes and the next 3 elements for testing purposes. The last line prints the output i.e., the predicted values and the actual labels.
clc; clear all; close all;
load hald
X = ingredients; % Predictor variables
y = heat; % Response
mdl = fitlm(X(1:10,:),y(1:10,:));
predicted_values = feval(mdl,X(11:end,:));
[y(11:end,:) predicted_values]
The output is given as :
ans =
83.8000 80.2845
113.3000 112.8545
109.4000 112.5293
However can anyone explain to me what is meant by Generalized Linear Regression Model ? In matlab, there are two commands specifically for this : glmfit/glmval and fitglm/feval.
The code for applying the generalized linear regreesion model is given below:
mdl = fitglm(X(1:10,:),y(1:10,:),'quadratic');
predicted_values = feval(mdl,X(11:end,:));
error = sum((y(11:end,:)-predicted_values).^2)
[b, dev] = glmfit(X(1:10,:),y(1:10,:),'normal','link','identity');
predicted_values = glmval(b,X(11:end,:),'identity');
error = sum((y(11:end,:)-predicted_values).^2)
What is the difference between the two operations ?
Also glmfit
has a term called distr
and link
. What does this distribution mean ? How to choose the best distribution ? For the above example, based only on the data how does one estimate the distribution apriori?
Also as I understand the link function is used to establish a link between the linear model and the response variables. Does it mean that logistic regression is a subset of generalized linear regression model? I read through the details at wiki link but could not clear my doubts.
Support Vector Regression
The code for Linear Regression is given as follows : Here I have the option to standardize the data. The kernel I have choosen is the rbf kernel with auto scale. Many options like polynomial kernel, gaussian kernel, linear, etc are also available.
mdl = fitrsvm(X(1:10,:),y(1:10,:),'KernelFunction','rbf','KernelScale','auto','Standardize',true);
predicted_values = predict(mdl,X(11:end,:));
Logistic Regression
I am unable to use logistic regression to solve this regression problem. I have through various sources and always they have solved the classification problem but my label space is continuous and not discrete. In this wiki articleit is explicitly stated that As such it is not a classification method. However based on the answers here and here it seems to me that logistic regression can only be used for classification ?
I have also gone through the mnrfit/mnrval tutorials but there also they deal with classification problems.
Please provide a small example based on my above data to show how logistic regression can be used for regression ?