Using bias in PyTorch for basic function approximation

Question

Using R, it is very easy to approximate basic functions through a neural network:

library(nnet)
x <- sort(10*runif(50))
y <- sin(x)
nn <- nnet(x, y, size=4, maxit=10000, linout=TRUE, abstol=1.0e-8, reltol = 1.0e-9, Wts = seq(0, 1, by=1/12) )
plot(x, y)
x1 <- seq(0, 10, by=0.1)
lines(x1, predict(nn, data.frame(x=x1)), col="green")
predict( nn , data.frame(x=pi/2) )

A simple neural network with one hidden layer of a mere 4 neurons is sufficient to approximate a sine. (As per stackoverflow question Approximating function with Neural Network.)

But I cannot obtain the same in PyTorch.

In fact, the neural network created by R contains not only an input, four hidden and an output, but also two "bias" neurons - the first connected towards the hidden layer, the second towards the output.

schema of the network built with R

The plot above is obtained through the following:

library(devtools)
library(scales)
library(reshape)
source_url('https://gist.github.com/fawda123/7471137/raw/cd6e6a0b0bdb4e065c597e52165e5ac887f5fe95/nnet_plot_update.r')
plot.nnet(nn$wts,struct=nn$n, pos.col='#007700',neg.col='#FF7777')   ### this plots the graph
plot.nnet(nn$wts,struct=nn$n, pos.col='#007700',neg.col='#FF7777', wts.only=1)   ### this prints the weights

Attempting the same with PyTorch produces a different network: the bias neurons are missing.

Following is an attempt to do in PyTorch what was done previously in R. The results will not be satisfactory: the function is not approximated. The most evident difference is that absence of the bias neurons.

import torch
from torch.autograd import Variable

import random
import math

N, D_in, H, D_out = 1000, 1, 4, 1

l_x = []
l_y = []

for a in range(1000):
    r = random.random()*10
    l_x.append( [r] )
    l_y.append( [math.sin(r)] )


tx = torch.cuda.FloatTensor(l_x)
ty = torch.cuda.FloatTensor(l_y)

x = Variable(tx, requires_grad=False)
y = Variable(ty, requires_grad=False)

w1 = Variable(torch.randn(D_in, H ).type(torch.cuda.FloatTensor), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(torch.cuda.FloatTensor), requires_grad=True)

learning_rate = 1e-5
for t in range(1000):
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    loss = (y_pred - y).pow(2).sum()
    if t<10 or t%100==1: print(t, loss.data[0])

    loss.backward()

    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    w1.grad.data.zero_()
    w2.grad.data.zero_()


t = [ [math.pi] ]
print( str(t) +" -> "+ str( (Variable(torch.cuda.FloatTensor( t ))).mm(w1).clamp(min=0).mm(w2).data ) )
t = [ [math.pi/2] ]
print( str(t) +" -> "+ str( (Variable(torch.cuda.FloatTensor( t ))).mm(w1).clamp(min=0).mm(w2).data ) )

How to make the network approximate to the given function (sine in this case), through either inserting the "bias" neurons or other missing detail?

Moreover: I have difficulties in understanding why R inserts the "bias". I found information that the bias could be akin to the "Intercept in a Regression Model" - I still find it not clear. Any information would be appreciated. EDIT: an excellent explanation turned out to be at stackoverflow question Role of Bias in Neural Networks

EDIT:

An example to obtain the result, though using the "fuller" framework ("not reinventing the wheel") is as follows:

import torch
from torch.autograd import Variable
import torch.nn.functional as F

import math

N, D_in, H, D_out = 1000, 1, 4, 1

l_x = []
l_y = []

for a in range(1000):
    t = (a/1000.0)*10
    l_x.append( [t] )
    l_y.append( [math.sin(t)] )

x = Variable( torch.FloatTensor(l_x) )
y = Variable( torch.FloatTensor(l_y) )


class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.to_hidden = torch.nn.Linear(n_feature, n_hidden)
        self.to_output = torch.nn.Linear(n_hidden,  n_output)

    def forward(self, x):
        x = self.to_hidden(x)
        x = F.tanh(x)           # activation function
        x = self.to_output(x)
        return x


net = Net(n_feature = D_in, n_hidden = H, n_output = D_out)

learning_rate =  0.01 
optimizer = torch.optim.Adam( net.parameters() , lr=learning_rate )

for t in range(1000):
    y_pred = net(x) 

    loss = (y_pred - y).pow(2).sum()
    if t<10 or t%100==1: print(t, loss.data[0])

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()


t = [ [math.pi] ]
print( str(t) +" -> "+ str( net( Variable(torch.FloatTensor( t )) ) ) )
t = [ [math.pi/2] ]
print( str(t) +" -> "+ str( net( Variable(torch.FloatTensor( t )) ) ) )

Unfortunately, while this code works properly, it does not solve the matter of making the original, more "low level" code work as expected (e.g. introducing the bias).

Because you haven't add a bias term. Why not just use `nn.Linear()` with some non-linear activation functions provided by `nn.Module` and use `torch.optim` to optimize your model? Do not re-invent the wheel if there are simpler ways to do what you want. — jdhao, Dec 09 '17 at 07:11
@jdhao, the reason was didactic: I had first a perplexity about "why does this not work", which I supposed was because of the absence of a bias, and then a difficulty in implementing the bias. It is in order to understand how the process works, before using the libraries. — mdp, Feb 28 '18 at 12:52

score 0 · Answer 1 · answered Dec 11 '17 at 23:06

Following up on @jdhao's comment - this is a super-simple PyTorch model that computes exactly what you want:

 class LinearWithInputBias(nn.Linear):
    def __init__(self, in_features, out_features, out_bias=True, in_bias=True):
        nn.Linear.__init__(self, in_features, out_features, out_bias)
        if in_bias:
            in_bias = torch.zeros(1, out_features)
            # in_bias.normal_()  # if you want it to be randomly initialized
            self._out_bias = nn.Parameter(in_bias)

    def forward(self, x):
        out = nn.Linear.forward(self, x)
        try:
            out = out + self._out_bias
        except AttributeError:
            pass
        return out

However, there's an additional bug in your code: from what I can see, you don't train it - i.e. you do not call an optimizer (like torch.optim.SGD(mod.parameters()) before you delete the gradient information by calling grad.data.zero_().

I am not sure, the implementation of bias already there in torch.nn.Linear - an extra parameter aside the weighs - should be sufficient... I do not understand the role of the extra bias ("in_bias") you added in the computation — mdp, Feb 28 '18 at 12:58
You technically never need it, as that first bias term does not add any real functionality (you can translate any additional bias using the transformation and add it to the second bias, and then remove the first for an equivalent linear approximator). However, I understood your original question as to inquire on how to have several bias terms in one pytorch module. — cleros, Mar 02 '18 at 23:31

Using bias in PyTorch for basic function approximation

1 Answers1