0

I am attempting to understand the excellent Code given as a guide by Andrej Karpathy: https://gist.github.com/karpathy/d4dee566867f8291f086

I am new to python, still learning!

I am doing the best I can to understand the following code from the link:

# perform parameter update with Adagrad
for param, dparam, mem in zip([Wxh, Whh, Why, bh, by], 
                              [dWxh, dWhh, dWhy, dbh, dby], 
                              [mWxh, mWhh, mWhy, mbh, mby]):
    mem += dparam * dparam
    param += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad update

I have read up on the zip function and done some short tests to try to understand how this works.

What I know so far, 5 Iterations, param == Wxh on the first iteration but not there on...

Ideally I am trying to convert this code to C#, and to do that I need to understand it.

In referring to Python iterator and zip it appears as we are multiplying each item of each array:

 param = Wxh * dWxh * mWxh

But then the variables param dparam and mem are being modified outside the zip function.

How do these variables function in this for loop scenario?

Community
  • 1
  • 1
Rusty Nail
  • 2,692
  • 3
  • 34
  • 55

6 Answers6

1

Write a simple for loop with zip will help you learn a lot.

for example:

for a, b, c in zip([1,2,3],
                    [4,5,6],
                    [7,8,9]):
    print a
    print b
    print c
    print "/"

This function will print: 1 4 7 / 2 5 8 / 3 6 7

So that the zip function just put those three lists together, and then using three variables param, dparam, mem to refer to different list.

In each iteration, those three variables refer to certain item in their corresponding lists, just like for i in [1, 2, 3]:.

In this way, you only need to write one for loop instead of three, to update grads for each parameters: Wxh, Whh, Why, bh, by.

In the first iteration, only Wxh is updated using dWxh and mWxh following the adagrad rule. And secondly, update Whh using dWhh and mWhh, and so on.

Marshall7
  • 46
  • 6
  • Thank you, excellent explanation, however, how do these variables: param, dparam, mem work, how is the math being done and where is the result saved? I am guessing back in the original lists? This is a complex beast for my beginner skill level! – Rusty Nail Apr 05 '17 at 06:31
  • 1
    [Wxh, Whh, Why, bh, by] will update in this for loop. [dWxh, dWhh, dWhy, dbh, dby] only been used in for loop, but it will update out side this for loop: loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev) in each forward computing. And the changes of [mWxh, mWhh, mWhy, mbh, mby] will also be saved, since it is the the sum of the squares of the gradients up to time step. – Marshall7 Apr 05 '17 at 06:55
1

What does zip do?

Quoting from the official documentation:

Zip returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence.

It means,

 >>> zip(["A", "B"], ["C", "D"], ["E", "F"])
 [('A', 'C', 'E'), ('B', 'D', 'F')]

So now, when you are looping through, you actually have a list of tuples. With Content like.

 # These are strings here but in your case these are objects
 [('Wxh', 'dWxh', 'mWxh'), ('Whh', 'dWhh', 'mWhh'), ('Why', 'dWhy', 'mWhy'),
  ('bh', 'dbh', 'mbh'),('by', 'dby', 'mby')]

What I know so far, 5 Iterations, param == Wxh on the first iteration but not there on...

You are right, Now lets analyze your loop.

  for param, dparam, mem in m:
      print(param, dparam, mem)

  # Which prints
('Wxh', 'dWxh', 'mWxh')
('Whh', 'dWhh', 'mWhh')
('Why', 'dWhy', 'mWhy')
('bh', 'dbh', 'mbh')
('by', 'dby', 'mby')

Which means, on every iteration, the params get the zeroth index tuple value, dparam get the first and mem gets the second.

Now when I type param out of the scope of for loop, I get

   >>> param
   'by'

It means params still holds the reference to by object.

From official documentation:

The for-loop makes assignments to the variables(s) in the target list. [...] Names in the target list are not deleted when the loop is finished, but if the sequence is empty, they will not have been assigned to at all by the loop.

Charul
  • 450
  • 5
  • 18
1

Any sequence (or iterable) can be unpacked into variables using a simple assignment operation. The only requirement is that the number of variables and structure match the sequence. For example:

t = (2, 4)
x, y = t

In this case zip() as per standard documentation is " zip() Make an iterator that aggregates elements from each of the iterables.Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. So, for your case

for param, dparam, mem in zip([Wxh, Whh, Why, bh, by], 
                              [dWxh, dWhh, dWhy, dbh, dby], 
                              [mWxh, mWhh, mWhy, mbh, mby]):
    mem += dparam * dparam
    param += -learning_rate * dparam / np.sqrt(mem + 1e-8)

lets say:
iterable1 = [Wxh, Whh, Why, bh, by]
iterable2 = [dWxh, dWhh, dWhy, dbh, dby]
iterable3 = [mWxh, mWhh, mWhy, mbh, mby]

here zip() returns [(Wxh, dWxh, mWxh), (Whh, dWhh, mWhh), (Why, dWhy, mWhy), (bh, dbh, mbh), (by, dby, mby)]

on 1st iteration:
param, dparam, mem = (Wxh, dWxh, mWxh)
so, 
param = Wxh
dparam = dWxh
mem = mWxh
mem = mem + (dparam * dparam) = mWxh + (dWxh * dWxh)
param = param + (-learning_rate * dparam / np.sqrt(mem + 1e-8)) = Wxh + (-learning_rate * dWxh / np.sqrt(mWxh + (dWxh * dWxh) + 1e-8)

on 2nd iteration:
param, dparam, mem = (Whh, dWhh, mWhh)
so, 
param = Whh
dparam = dWhh
mem = mWhh
an so on.
JkShaw
  • 1,927
  • 2
  • 13
  • 14
  • Excellent explanation, Thank You! Many excellent answers, but this is the most detailed and answers what I was looking for. Thank You to All! – Rusty Nail Apr 05 '17 at 07:26
0

Python treats the variables merely as labels or name tags. Since you have zipped those inside a list of lists, it doesn't matter where they are, as long as you address them by their name / label correctly. Kindly note, this may not work for immutable types like int or str, etc. Refer to this answer for more explanation - Immutable vs Mutable types.

shad0w_wa1k3r
  • 12,955
  • 8
  • 67
  • 90
0

Thank you all for excellent answers!

My python skill is poor, so I am sorry for that!

import numpy as np

print('----------------------------------------')
print('Before modification:')
a = np.random.randn(1, 3) * 1.0
print('a: ', a)
b = np.random.randn(1, 3) * 1.0
print('b: ', b)
c = np.random.randn(1, 3) * 1.0
print('c: ', c)

print('----------------------------------------')

for a1, b1, c1 in zip([a, b, c], [a, b, c], [a, b, c]):
    a1 += 10 * 0.01
    b1 += 10 * 0.01
    c1 += 10 * 0.01
    print('a1 is Equal to a: ', np.array_equal(a1, a))
    print('a1 is Equal to b: ', np.array_equal(a1, b))
    print('a1 is Equal to c: ', np.array_equal(a1, c))
    print('----------------------------------------')

print('After modification:')
print('a: ', a)
print('b: ', b)
print('c: ', c)
print('----------------------------------------')

Outputs:

----------------------------------------
Before modification:
a:  [[-0.79535459 -0.08678677  1.46957521]]
b:  [[-1.05908792 -0.90121069  1.07055281]]
c:  [[ 1.18976226  0.24700716 -0.08481322]]
----------------------------------------
a1 is Equal to a:  True
a1 is Equal to b:  False
a1 is Equal to c:  False
----------------------------------------
a1 is Equal to a:  False
a1 is Equal to b:  True
a1 is Equal to c:  False
----------------------------------------
a1 is Equal to a:  False
a1 is Equal to b:  False
a1 is Equal to c:  True
----------------------------------------
After modification:
a:  [[-0.69535459  0.01321323  1.56957521]]
b:  [[-0.95908792 -0.80121069  1.17055281]]
c:  [[ 1.28976226  0.34700716  0.01518678]]

jyotish is exactly right, and answered what I was missing! Thank You!

For C# I think I will look at a Parallel.For implementation here.

EDIT:

For others learning also, I also found it helpful to see this code work:

import numpy as np

print('----------------------------------------')
print('Before modification:')
a = np.random.randn(1, 3) * 1.0
print('a: ', a)
b = np.random.randn(1, 3) * 1.0
print('b: ', b)
c = np.random.randn(1, 3) * 1.0
print('c: ', c)

print('----------------------------------------')

for a1, b1, c1 in zip([a, b, c], [a, b, c], [a, b, c]):
    a1[0][0] = 10 * 0.01
    print('a1 is Equal to a: ', np.array_equal(a1, a))
    print('a1 is Equal to b: ', np.array_equal(a1, b))
    print('a1 is Equal to c: ', np.array_equal(a1, c))
    print('----------------------------------------')

print('After modification:')
print('a: ', a)
print('b: ', b)
print('c: ', c)
print('----------------------------------------')

Outputs:

----------------------------------------
Before modification:
a:  [[-0.78734047 -0.04803815  0.20810081]]
b:  [[ 1.88121331  0.91649695  0.02482977]]
c:  [[-0.24219954 -0.10183608  0.85180522]]
----------------------------------------
a1 is Equal to a:  True
a1 is Equal to b:  False
a1 is Equal to c:  False
----------------------------------------
a1 is Equal to a:  False
a1 is Equal to b:  True
a1 is Equal to c:  False
----------------------------------------
a1 is Equal to a:  False
a1 is Equal to b:  False
a1 is Equal to c:  True
----------------------------------------
After modification:
a:  [[ 0.1        -0.04803815  0.20810081]]
b:  [[ 0.1         0.91649695  0.02482977]]
c:  [[ 0.1        -0.10183608  0.85180522]]
----------------------------------------

As you can see, only modifying the first column of the <class 'numpy.ndarray'> data type that I am using. Its a reasonably deep operation.

Rusty Nail
  • 2,692
  • 3
  • 34
  • 55
-1

Here is the same code on C#:

    public void UpdateParametersWithAdagrad(WordGenerationRNNLossFunResultModel lossFunResultModel, Matrix mWxh, Matrix mWhh, Matrix mWhy, Matrix  mbh, Matrix mby, double learning_rate)
    {
        //mem += dparam * dparam;
        //param += -learning_rate * dparam / np.sqrt(mem + 1e-8); // adagrad update

        var param = new List<Matrix> { Wxh, Whh, Why, bh, by };
        var dparam = new List<Matrix> { lossFunResultModel.DWxh, lossFunResultModel.DWhh, lossFunResultModel.DWhy, lossFunResultModel.Dbh, lossFunResultModel.Dby };
        var mem = new List<Matrix> { mWxh, mWhh, mWhy, mbh, mby };

        for (int i = 0; i < dparam.Count; i++)
        {
            mem[i] += dparam[i] * dparam[i];
            param[i] += -learning_rate * dparam[i] / (mem[i] + 1e-8).Sqrt(); // adagrad update
        }
    }
  • Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. – Simas Joneliunas Dec 20 '22 at 01:24