3

I have been trying to create a small neural network to learn softmax function with an article from the following website: https://mlxai.github.io/2017/01/09/implementing-softmax-classifier-with-vectorized-operations.html

It works well for a single iteration. But, when I create a loop for training the network with updated weights, I get the following error: ValueError: operands could not be broadcast together with shapes (5,10) (1,5) (5,10). I have attached a screenshot of the output here.enter image description here

Debugging this issue, I found out that np.max() returns array of shape (5,1) and (1,5) at different iterations even though the axis is being set to 1. Please help me in identifying what went wrong in the following code.

import numpy as np

N = 5
D = 10
C = 10

W = np.random.rand(D,C)
X = np.random.randint(255, size = (N,D))
X = X/255
y = np.random.randint(C, size = (N))
#print (y)
lr = 0.1

for i in range(100):
  print (i)
  loss = 0.0
  dW = np.zeros_like(W)
  N = X.shape[0]
  C = W.shape[1]

  f = X.dot(W)
  #print (f)

  print (np.matrix(np.max(f, axis=1)))
  print (np.matrix(np.max(f, axis=1)).T)
  f -= np.matrix(np.max(f, axis=1)).T
  #print (f)  

  term1 = -f[np.arange(N), y]
  sum_j = np.sum(np.exp(f), axis=1)
  term2 = np.log(sum_j)
  loss = term1 + term2
  loss /= N 
  loss += 0.5 * reg * np.sum(W * W)
  #print (loss)

  coef = np.exp(f) / np.matrix(sum_j).T
  coef[np.arange(N),y] -= 1
  dW = X.T.dot(coef)
  dW /= N
  dW += reg*W

  W = W - lr*dW
Subhanandh
  • 141
  • 1
  • 11
  • 2
    You never define `reg` in your code – FHTMitchell Mar 12 '18 at 10:57
  • Your problem is that `f` starts off as an `array`, and then later becomes a `matrix`, which cause `max` to behave differently. I would suggest [**always**](https://stackoverflow.com/questions/4151128/what-are-the-differences-between-numpy-arrays-and-matrices-which-one-should-i-u) using `np.array` instead of `np.matrix` and using `np.dot` (or `@` in python > 3.5) rather than `*` for matrix multiplication. – FHTMitchell Mar 12 '18 at 11:28
  • Thank you for pointing out the reason for this problem and suggesting right operations to use. – Subhanandh Mar 12 '18 at 11:50

1 Answers1

3

In your first iteration, W is an instance of np.ndarray with shape (D, C). f inherits ndarray, so when you do np.max(f, axis = 1), it returns a an ndarray of shape (D,), which np.matrix() turns into shape (1, D) which is then transposed to (D, 1)

But on your following iterations, W is an instance of np.matrix (which it inherits from dW in W = W - lr*dW). f then inherits np.matrix, and np.max(f, axis = 1) returns a np.matrix of shape (D, 1), which passes through np.matrix() unphased and turns into shape (1, D) after .T

To fix this, make sure you don't mix np.ndarray with np.matrix. Either define everything as np.matrix from the start (i.e. W = np.matrix(np.random.rand(D,C))) or use keepdims to maintain your axes like:

f -= np.max(f, axis = 1, keepdims = True)

which will let you keep everything 2D without needing to cast to np.matrix.(also do this for sum_j)

Daniel F
  • 13,620
  • 2
  • 29
  • 55
  • Also, as @FHTMitchell noted in the comments, it's recommended to avoid `np.matrix` for exactly this reason. It's mostly just a crutch for people coming from `MATLAB`, and isn't well integrated with most `numpy` tools. You can do pretty much all of the same things with `np.atleast_2d()`, `keepdims = True` and `@` without having these strange side effects. – Daniel F Mar 12 '18 at 11:33
  • Thank you for helping me with this issue and explaining about the difference in shapes due to np.ndarray and np.matrix. – Subhanandh Mar 12 '18 at 11:45