If you want to do this for just the current problem, it is super easy. Note that
Epoch_index = floor((iteration_index * batch_size) / (# data_samples))
Now, in solver.cpp
, find the line where Caffe prints Iterations ..., loss = ...
. Just compute epoch index using the above formula and print that too. You are done. Do not forget to recompile Caffe.
If you want to modify Caffe so that it always shows the epoch index, then you will first need to compute the data size from all your HDF5 files. By glancing the Caffe HDF5 layer code, I think you can get the number of data samples by hdf_blobs_[0]->shape(0)
. You should add this up for all HDF5 files and use that number in solver.cpp
.
The variable hdf_blobs_
is defined in layers/hdf5_data_layer.cpp
. I believe it is populated in the function util/hdf5.cpp
. I think this is how the flow goes:
- In
layers/hdf5_data_layer.cpp
, the hdf5 filenames are read from the text file.
- Then a function
LoadHDF5FileData
attempts to load the hdf5 data into blobs.
- Inside
LoadHDF5FileData
, the blob variable - hdf_blobs_
- is declared and it is populated inside the function util/hdf5.cpp
.
- Inside
util/hdf5.cpp
, the function hdf5_load_nd_dataset
first calls hdf5_load_nd_dataset_helper
that reshapes the blobs accordingly. I think this is where you will get the dimensions of your data for one hdf5 file. Iterating over multiple hdf5 files is done in the void HDF5DataLayer<Dtype>::Next()
function in layers/hdf5_data_layer.cpp
. So here you need to add up the data dimensions received earlier.
Finally, you need to figure out how to pass them back till solver.cpp
.