Training Mini-batches of data (without labels) for unsupervised learning

Question

Has anyone trained mini-batches of data for an unsupervised learning problem? The feed_dict uses the label and in an unsupervised setting. How do you overcome that? Could we use fake labels that never contribute to the loss function?

Basically, I want to iterate over my huge dataset and then optimize a custom loss function. However, I couldn't figure out how to retain my training parameters (weights) when using a new mini-batch from the data explicitly.

For example, the whole dataset is 6000 points and the mini-batch size is 600. Currently, for every mini-batch I could only use new independent weight parameters because the weights are initialized based on the data points from this mini-batch. When we optimize the loss over the first mini-batch of 600 data points, we get some optimized weights. How does one use these weights to optimize the next mini-batch of 600 data points and so on. The problem is we cannot use a shared global variable.

I researched on stackoverflow forums but couldn't find anything relevant for mini-batches over unsupervised data.

'f' is my whole dataset say text data of N points with dimension D U is cluster centroid with K clusters again of dimension D

I define my variables as below:

F = tf.Variable(f.astype(np.float32), name='F') 
U = tf.Variable(u.astype(np.float32), name='U')
FMod = tf.reshape(F, [N/K, K, D], name='FMod')
UMod = tf.reshape(U, [1, K, D], name='UMod')

Then I define a custom loss or objective function as 'objective'

Next I use an optimizer

optimizer = tf.train.AdamOptimizer(learning_rate)
train_W = optimizer.minimize(objective, var_list=[F, U])

Finally, I evaluate the variable as

with tf.Session() as sess:

    # Initialize all the variables
    sess.run(init_op)

    for n in range(noEpochs):

        objval1 = sess.run([train_W, objective])

The thing I am stuck at - is to iterate over batches of my data 'f' which is ultimately used in the optimizer train_W. If I have a for loop over these mini-batches, I will assign a new variable train_W for each of these iterations. How can I pass this value so that it can be used in the next mini-batch?

Any help or pointers in this regard would be really appreciated. Thanks in advance!

hi, did you manage to do this ? I am planning to train an unsupervised network too without labels. — Mj1992, Jan 29 '18 at 08:42
Yes, I did it using a placeholder variable on only X i.e. the data points and then passing the indices in the feed_dict. idx = tf.placeholder(tf.int64, shape=(miniBatchSize)) ... sess.run([opt_var], feed_dict = {idx: idxTensor}) Hope this helps. Let me know if you are stuck! — Vishal, Feb 07 '18 at 20:41

Training Mini-batches of data (without labels) for unsupervised learning

0 Answers0