Perceptron : Weight for each sample data , or one common weight

Question

With Perception learning, I am really confused on initializing and updating weight. If I have a sample data that contains 2 inputs x0 and x1 and I have 80 rows of these 2 inputs, hence 80x2 matrix.

Do I need to initialize weight as a matrix of 80x2 or just 2 values w0 and w1 ? Is final goal of perceptron learning is to find 2 weights w0 and w1 which should fit for all 80 input sample rows ?

I have following code and my errors never get to 0, despite going up to 10,000 iterations.

x=input matrix of 80x2
y=output matrix of 80x1
n = number of iterations
w=[0.1,0.1]  
learningRate = 0.1
for i in range(n): 
    expectedT = y.transpose();
    xT = x.transpose()
    prediction =  np.dot (w,xT) 

    for i in range (len(x)):    
        if prediction[i] >= 0:                               
                 ypred[i] = 1                               
        else:                                   
                ypred[i] = 0

    error = expectedT - ypred

    # updating the weights
    w = np.add(w,learningRate*(np.dot(error,x)))
    globalError = globalError + np.square(error)

You have 80 samples with 2 features. For each feature you will train one weight if you use a single layer perceptron. Hence you will need only 2 weights. But it's also common to introduce a bias layer with another weight. Thus you would use 3 weights. Also It is possible that your error will never reach 0 as a single layer perceptron can not solve xor for example. — mjspier, Nov 13 '19 at 15:01
@mjspier Thank you for confirming on weight. In code I shared, once in an iteration if there is no error, do I need to ignore that sample row in next iteration ? — Pit Digger, Nov 13 '19 at 16:52

mjspier · Accepted Answer · 2019-11-14T16:20:57.437

For each feature you will have one weight. Thus you have two features and two weights. It also helps to introduce a bias which adds another weight. For more information about bias check this Role of Bias in Neural Networks. The weights indeed should learn how to fit the sample data best. Depending on the data this can mean that you will never reach error of 0. For example a single layer perceptron can not learn an XOR gate when using a monotonic activation function. (solving XOR with single layer perceptron).

For your example I would recommend two things. Introducing a bias and stopping the training when the error is below a certain threshold or if error is 0 for example.

I completed your example to learn a logical AND gate:

# AND input and output
x = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([0,1,1,1])

n = 1000
w=[0.1,0.1,0.1]  
learningRate = 0.01
globalError = 0

def predict(X):
    prediction =  np.dot(w[0:2],X) + w[2] 
    ypred = np.zeros(len(y))
    for i in range (len(y)):    
        if prediction[i] >= 0:                               
                ypred[i] = 1                               
        else:                                   
                ypred[i] = 0
    return ypred

for i in range(n): 
    expectedT = y.transpose();
    xT = x.transpose()
    ypred = predict(xT)

    error = expectedT - ypred
    if sum(error) == 0:
        break

    # updating the weights
    w[0:2] = np.add(w[0:2],learningRate*(np.dot(error,x)))
    w[2] += learningRate*sum(error)
    globalError = globalError + np.square(error)

After the training the error is 0

print(error)
# [0. 0. 0. 0.]

And the weights are as follows

print(w)
#[0.1, 0.1, -0.00999999999999999]

The perceptron can be used now as AND gate:

predict(x.transpose())
#array([0., 1., 1., 1.])

Hope that helps

Perceptron : Weight for each sample data , or one common weight

1 Answers1