Pytorch Linear regression 1x1d, consistantly wrong slope

Question

I am mastering pytorch here, and decided to implement very simple 1 to 1 linear regression, from height to weight.

Got dataset: https://www.kaggle.com/datasets/mustafaali96/weight-height but any other would do nicely.

Lets import libraries and information about females:

import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('weight-height.csv',sep=',')
#https://www.kaggle.com/datasets/mustafaali96/weight-height
height_f=df[df['Gender']=='Female']['Height'].to_numpy()
weight_f=df[df['Gender']=='Female']['Weight'].to_numpy()
plt.scatter(height_f, weight_f, c ="red",alpha=0.1)
plt.show()

Which gives nice scatter of measured females:

So far, so good.

Lets make Dataloader:

class Data(Dataset):
  def __init__(self, X: np.ndarray, y: np.ndarray) -> None:
    # need to convert float64 to float32 else
    # will get the following error
    # RuntimeError: expected scalar type Double but found Float
    self.X = torch.from_numpy(X.reshape(-1, 1).astype(np.float32))
    self.y = torch.from_numpy(y.reshape(-1, 1).astype(np.float32))    
    self.len = self.X.shape[0]  
  def __getitem__(self, index: int) -> tuple:
    return self.X[index], self.y[index]  
  def __len__(self) -> int:
    return self.len

traindata = Data(height_f, weight_f)
batch_size = 500
num_workers = 2
trainloader = DataLoader(traindata, 
                         batch_size=batch_size, 
                         shuffle=True, 
                         num_workers=num_workers)

...linear regression model...

class linearRegression(torch.nn.Module):
    def __init__(self, inputSize, outputSize):
        super(linearRegression, self).__init__()
        self.linear = torch.nn.Linear(inputSize, outputSize)
        

    def forward(self, x):
        out = self.linear(x)
        return out
model = linearRegression(1, 1)
criterion = torch.nn.MSELoss() 
optimizer = torch.optim.SGD(model.parameters(), lr=0.00001)

.. lets train it:

epochs=10
for epoch in range(epochs):
    print(epoch)
    for i, (inputs, labels) in enumerate(trainloader):
        
        outputs=model(inputs)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

gives 0,1,2,3,4,5,6,7,8,9 now lets see what our model gives:

range_height_f=torch.linspace(height_f.min(),height_f.max(),150)

plt.scatter(height_f, weight_f, c ="red",alpha=0.1)
pred=model(range_height_f.reshape(-1, 1))
plt.scatter(range_height_f, pred.detach().numpy(), c ="green",alpha=0.1)

...

Why does it do this? Why wrong slope? consistently wrong slope, I might add Whatever I change, optimizer, batch size, epochs, females to males.. it gives me this very wrong slope, and I really don't get - why?

Edit 1: Added loss, here is plot

Edit 2: Have decided to explore a bit, and made regression with skilearn:

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
X_train, X_test, y_train, y_test = train_test_split(height_f, weight_f, test_size = 0.25)

regr = LinearRegression()
regr.fit(X_train.reshape(-1,1), y_train)
plt.scatter(height_f, weight_f, c ="red",alpha=0.1)
range_pred=regr.predict(range_height_f.reshape(-1, 1))
range_pred
plt.scatter(range_height_f, range_pred, c ="green",alpha=0.1)

which gives following regression, which looks nice:

t = torch.from_numpy(height_f.astype(np.float32))
p=regr.predict(t.reshape(-1,1))
p=torch.from_numpy(p).reshape(-1,1)


w= torch.from_numpy(weight_f.astype(np.float32)).reshape(-1,1)

print(criterion(p,w).item())

However in this case criterion=100.65161998527695

Pytorch in own turn converges to about 210

Edit 3 Changed optimisation to Adam from SGD:

#optimizer = torch.optim.SGD(model.parameters(), lr=0.00001)
optimizer = torch.optim.Adam(model.parameters(), lr=0.5)

lr is larger in this case, which yields interesting, but consistent result. Here is loss: , And here is proposed regression:

And, here is log of loss criterion as well for Adam optimizer:

Have you tried normalizing your inputs & outputs? In my experience NNs struggle when dealing with large input/output values — yhenon, Dec 19 '22 at 17:44
This is very interesting. Have you checked what algorithm scikit-learn uses to solve the linear regression? I suspect it's not SGD. — kmkurn, Dec 21 '22 at 08:12

Shai · Accepted Answer · 2022-12-29T07:13:51.803

I think your issue stems from the data not being centered around zero.

See this thread for another example where "centering" the data prior to training has a huge effect on the convergence of SGD optimization.

Update (Dec 29^the, 2022):

TL;DR
It's all about normalization/initialization.

In detail:
Your data is not centered around 0 and it is not scaled "nicely". This makes it very difficult to SGD (and all other variants of it) to struggle with optimization.

In this answer I showed how centering the training data (subtracting mean and deciding by the std) solves this problem.

Here I'll show you how to leave your data as-is, but change the initialization of the weights to solve your problem.

let m_x, s_x be the mean and std of X, and m_y, s_y be the mean and std of y.
When pytorch init the weights, a and b, for the linear layer y = aX + b it assumes X and y have zero mean and unit variance. This is NOT the case here. Far from it.
Therefore, we need to re-adjust the initial a and b accordingly.
Here's the math for it:

And the code:

mu_x, sig_x, mu_y, sig_y = traindata.X.mean().item(), traindata.X.std().item(), traindata.y.mean().item(), traindata.y.std().item()
# just for fun, here are the values:
# (63.7087, 2.6962, 135.8601, 19.0225)

# start a fresh model and adjust its initial values:
model = linearRegression(1, 1)
model.linear.weight.data *= (sig_x / sig_y)
model.linear.bias.data = sig_y * (-(mu_x/sig_x)+(mu_y/sig_y))

# now you are good to go! continue optimizing like you originally did:
# init an optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.00001)

# optimize for 10 epochs (now you don't need this much, you can even increase the learning rate...)
epochs=10
for epoch in range(epochs):
    print(epoch)
    for i, (inputs, labels) in enumerate(trainloader):
        
        outputs=model(inputs)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

The loss curve looks like this:
And the optimizer converged to

In []: loss.item()
Out[]: 100.9453125

Similar to that of sklearn.linear_model.LinearRegression.

Plotting the prediction on the data:

Same thing. Added edit 3 with Adam as optimizer. it converges to 210 in very similar way — Timo Junolainen, Dec 28 '22 at 17:23
seems comprehensible, I am just at very uncomfortable desktop solution :-( sorry if I missed idea due to this. I absolutely accept answer, dunno why I missed it, but seems I did. I — Timo Junolainen, Dec 29 '22 at 12:35
thanks! I had a bit stupid perception. explains consistantly similar wrong tilt. Thanks a lot! — Timo Junolainen, Dec 29 '22 at 12:37

score 2 · Answer 2 · answered Dec 27 '22 at 14:01

2

The issue seems to be feature scaling/centering. With no gradient descent, classic linear regression is able to derive the solution with no scaling.

For SGD however, it is much harder to converge this way.

Try adding this before implementing the Dataset:

from sklearn.preprocessing import StandardScaler
height_f = StandardScaler().fit_transform(height_f.reshape(-1, 1))

I was able to achieve a good result using learning rate of 0.1 after that.

answered Dec 27 '22 at 14:01

dx2-66

2,376
2
4
14

I've added Edit 3 with optim.Adam, and it still converges to 210+- :-( – Timo Junolainen Dec 28 '22 at 17:22
1

Adam is but an extension of the same old gradient descent. It still converges much better with centered data. – dx2-66 Dec 28 '22 at 19:18

titfortat · Answer 3 · 2022-12-19T16:23:36.853

0

As far as I can see, the code works as intended. I suggest adding an intercept term, though.

Just for clarification, I do not add code to my answer as I believe the issue is purely a theoretical one. Read up on the the simple linear regression model. If the data is non-zero mean (as is the case here), you can't possibly match the mean and the slope of the data with merely one coefficient.

edited Dec 19 '22 at 16:23

answered Dec 19 '22 at 14:58

titfortat

13
4

2

torch.nn.Linear has bias=True by default so that should account for the intercept I believe – yhenon Dec 19 '22 at 16:26

Alexander L. Hayes · Answer 4 · 2023-01-02T04:49:30.727

-1

It looks like the data loader loader + SGD is not handling the intercept properly. You should try adding a column of 1's to the data.

scikit-learn Linear Regression and SGDRegressor behave this way too if you set fit_intercept=False:

Minimal Reproducible Example for linear regression + no intercept.

from sklearn.linear_model import LinearRegression
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv("weight-height.csv")

X = df.Height.to_numpy().reshape(-1, 1)
y = df.Weight.to_numpy()

lr = LinearRegression(fit_intercept=False).fit(X, y)

X_test = np.linspace(df.Height.min(), df.Height.max()).reshape(-1, 1)
y_pred = lr.predict(X_test)

plt.scatter(df.Height, df.Weight, alpha=0.1)
plt.plot(X_test, y_pred, color="black")

edited Jan 02 '23 at 04:49

answered Jan 02 '23 at 03:45

Alexander L. Hayes

3,892
4
13
34

As pointed out in [a comment](https://stackoverflow.com/questions/74852107/pytorch-linear-regression-1x1d-consistantly-wrong-slope/74978851#comment132100151_74852211), by default `nn.Linear` has a bias/intercept term. So there shouldn’t be a need to add a column of 1’s. – kmkurn Jan 02 '23 at 04:30
I already saw that comment. The currently-accepted answer also recommends centering the data. Centering the data is usually a good idea since it makes the optimization better-defined, but a bias term should *usually* handle the centering problem. In the [scikit-learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html): **Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.** – Alexander L. Hayes Jan 02 '23 at 04:41
I think we agree on what the `nn.Linear` documentation says. But documentation is frequently outdated or incorrect, and the answer I've added here *suggests* that something else may need to be investigated. – Alexander L. Hayes Jan 02 '23 at 04:43
@AlexanderL.Hayes pytorch documentation is okay. `nn.Linear` has `bias` (aka "intercept"). The issue here is the initial value of it that does not match the strong bias in the input data. This can be adjusted. Se details in my answer. – Shai Jan 02 '23 at 11:53

Pytorch Linear regression 1x1d, consistantly wrong slope

4 Answers4

Update (Dec 29the, 2022):

Update (Dec 29^the, 2022):