0

I am using caffe with the HDF5 layer. It will read my hdf5list.txt as

/home/data/file1.h5
/home/data/file2.h5
/home/data/file3.h5

In each file*.h5, I have 10.000 images. So, I have about 30.000 images in total. In each iteration, I will use batch size is 10 as the setting

layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "./hdf5list.txt"
    batch_size: 10
    shuffle: true
  }
  include {
    phase: TRAIN
  }
}

Using caffe, Its output likes

Iterations 10, loss=100
Iterations 20, loss=90
...

My question is that how to compute the a number of epoch, respect to the loss? It means I want to plot a graph with x-axis is number of epoch and y-asix is loss.

Related link: Epoch vs iteration when training neural networks

Community
  • 1
  • 1
John
  • 2,838
  • 7
  • 36
  • 65

1 Answers1

1

If you want to do this for just the current problem, it is super easy. Note that

Epoch_index = floor((iteration_index * batch_size) / (# data_samples))

Now, in solver.cpp, find the line where Caffe prints Iterations ..., loss = .... Just compute epoch index using the above formula and print that too. You are done. Do not forget to recompile Caffe.

If you want to modify Caffe so that it always shows the epoch index, then you will first need to compute the data size from all your HDF5 files. By glancing the Caffe HDF5 layer code, I think you can get the number of data samples by hdf_blobs_[0]->shape(0). You should add this up for all HDF5 files and use that number in solver.cpp.

The variable hdf_blobs_ is defined in layers/hdf5_data_layer.cpp. I believe it is populated in the function util/hdf5.cpp. I think this is how the flow goes:

  1. In layers/hdf5_data_layer.cpp, the hdf5 filenames are read from the text file.
  2. Then a function LoadHDF5FileData attempts to load the hdf5 data into blobs.
  3. Inside LoadHDF5FileData, the blob variable - hdf_blobs_ - is declared and it is populated inside the function util/hdf5.cpp.
  4. Inside util/hdf5.cpp, the function hdf5_load_nd_dataset first calls hdf5_load_nd_dataset_helper that reshapes the blobs accordingly. I think this is where you will get the dimensions of your data for one hdf5 file. Iterating over multiple hdf5 files is done in the void HDF5DataLayer<Dtype>::Next() function in layers/hdf5_data_layer.cpp. So here you need to add up the data dimensions received earlier.

Finally, you need to figure out how to pass them back till solver.cpp.

Autonomous
  • 8,935
  • 1
  • 38
  • 77
  • Thanks Parag, it is nice solution. However, as you saw, my hdf5 files are stored in a text list file. Do you think can we get total of file from `hdf_blobs_[0]->shape(0).`? – John Apr 10 '17 at 04:42
  • That is what I said. If you want a solution just for this problem, then you know your number of data samples and you don't have to work with `hdf_blobs` etc. For details on your question, see my edit. – Autonomous Apr 10 '17 at 05:08
  • Thanks, I will try it. Do you try to print epoch number before. I have no idea why the caffe does not has that option, while other deep learning tools have that. – John Apr 10 '17 at 05:38
  • No, I have not tried printing an epoch. My suggestion would be to hard-code number of samples if you want to do it for just one problem/once. – Autonomous Apr 10 '17 at 05:41