0

The sizes of .index, .meta, and .data files of my saved model after training on a dataset of 10K sentences are 3KB, 58MB and 375MB respectively

Keeping the architecture of the network same and training it on a dataset of 100K sentences, the sizes of the files are 3KB, 139MB and 860MB

I think it suggests that the size depends on the size of the dataset. According to this answer, the size of the files should be independent of the size of the dataset as the architecture of the neural network is same.

Why is there such a huge difference in the sizes?

I would also like to know what more do these files contain apart from that mentioned in linked answer.

Do these files contain information related to training history like loss values at each step, etc?

Stuxen
  • 708
  • 7
  • 21
  • 2
    There is no way to explain this without code, in general the size of the model is independent of the number of data points in the training set, but maybe your code couples both in some way (like a vocabulary learned from the training set). – Dr. Snoopy Mar 05 '19 at 13:02
  • 1
    Thanks a lot! You were right. The "embeddings" variable is storing the word embeddings. Anyways, is there a way to get the training losses at each step from these files? I haven't written the summary to the event files. – Stuxen Mar 05 '19 at 16:02

2 Answers2

1
import tensorflow as tf
from tensorflow.python.training import checkpoint_utils as cp
cp.list_variables('./model.ckpt-12520')

Running the above snippet gives the following output

[('Variable', []), ('decoder/attention_wrapper/attention_layer/kernel', [600, 300]), ('decoder/attention_wrapper/attention_layer/kernel/Adam', [600, 300]), ('decoder/attention_wrapper/attention_layer/kernel/Adam_1', [600, 300]), ('decoder/attention_wrapper/bahdanau_attention/attention_b', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_b/Adam', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_b/Adam_1', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_g', []), ('decoder/attention_wrapper/bahdanau_attention/attention_g/Adam', []), ('decoder/attention_wrapper/bahdanau_attention/attention_g/Adam_1', []), ('decoder/attention_wrapper/bahdanau_attention/attention_v', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_v/Adam', [300]), ('decoder/attention_wrapper/bahdanau_attention/attention_v/Adam_1', [300]), ('decoder/attention_wrapper/bahdanau_attention/query_layer/kernel', [300, 300]), ('decoder/attention_wrapper/bahdanau_attention/query_layer/kernel/Adam', [300, 300]), ('decoder/attention_wrapper/bahdanau_attention/query_layer/kernel/Adam_1', [300, 300]), ('decoder/attention_wrapper/basic_lstm_cell/bias', [1200]), ('decoder/attention_wrapper/basic_lstm_cell/bias/Adam', [1200]), ('decoder/attention_wrapper/basic_lstm_cell/bias/Adam_1', [1200]), ('decoder/attention_wrapper/basic_lstm_cell/kernel', [900, 1200]), ('decoder/attention_wrapper/basic_lstm_cell/kernel/Adam', [900, 1200]), ('decoder/attention_wrapper/basic_lstm_cell/kernel/Adam_1', [900, 1200]), ('decoder/dense/kernel', [300, 49018]), ('decoder/dense/kernel/Adam', [300, 49018]), ('decoder/dense/kernel/Adam_1', [300, 49018]), ('decoder/memory_layer/kernel', [300, 300]), ('decoder/memory_layer/kernel/Adam', [300, 300]), ('decoder/memory_layer/kernel/Adam_1', [300, 300]), ('embeddings', [49018, 300]), ('embeddings/Adam', [49018, 300]), ('embeddings/Adam_1', [49018, 300]), ('loss/beta1_power', []), ('loss/beta2_power', []), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/bias', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam_1', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/kernel', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam_1', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/bias', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam_1', [600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/kernel', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam', [450, 600]), ('stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam_1', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/bias', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/bias/Adam_1', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/kernel', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/bw/basic_lstm_cell/kernel/Adam_1', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/bias', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/bias/Adam_1', [600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/kernel', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam', [450, 600]), ('stack_bidirectional_rnn/cell_1/bidirectional_rnn/fw/basic_lstm_cell/kernel/Adam_1', [450, 600])]

I realised that the embeddings variable is storing the word embeddings which is accounting for the increase in size of those files

cp.load_variable('./model.ckpt-12520', 'embeddings')
Stuxen
  • 708
  • 7
  • 21
0

The training summary is contained in your Event file.

YoungChul
  • 165
  • 12
  • I am saving the model as saver.save(sess, "./saved_model/model.ckpt", global_step=step). I have no event files as of now. Will I have to write the loss in the event file manually? Or is the loss written to it implicitly? – Stuxen Mar 05 '19 at 12:41