10

I want to find a caffe python data layer example to learn. I know that Fast-RCNN has a python data layer, but it's rather complicated since I am not familiar with object detection.
So my question is, is there a python data layer example where I can learn how to define my own data preparation procedure?
For example, how to do define a python data layer do much more data augmentation (such as translation, rotation etc.) than caffe "ImageDataLayer".

Thank you very much

Shai
  • 111,146
  • 38
  • 238
  • 371
kli_nlpr
  • 894
  • 2
  • 11
  • 25

2 Answers2

12

You can use a "Python" layer: a layer implemented in python to feed data into your net. (See an example for adding a type: "Python" layer here).

import sys, os
sys.path.insert(0, os.environ['CAFFE_ROOT']+'/python')
import caffe
class myInputLayer(caffe.Layer):
  def setup(self,bottom,top):
    # read parameters from `self.param_str`
    ...
  def reshape(self,bottom,top):
    # no "bottom"s for input layer
    if len(bottom)>0:
      raise Exception('cannot have bottoms for input layer')
    # make sure you have the right number of "top"s
    if len(top)!= ...
       raise ...
    top[0].reshape( ... ) # reshape the outputs to the proper sizes
    
  def forward(self,bottom,top): 
    # do your magic here... feed **one** batch to `top`
    top[0].data[...] = one_batch_of_data


  def backward(self, top, propagate_down, bottom):
    # no back-prop for input layers
    pass

For more information on param_str see this thread.
You can find a sketch of a data loading layer with pre-fetch here.

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
  • Thank you very much for your explanation, I will try to implement one and post my code here. o(^▽^)o – kli_nlpr Jan 26 '16 at 03:16
  • 1
    In fact I find one PR at caffe website. https://github.com/BVLC/caffe/pull/3471/files – kli_nlpr Jan 26 '16 at 04:15
  • Is it possible to use multithreading here, to load data quicker? – curio17 Mar 13 '17 at 03:50
  • @kishensurajP you can use Python threading module to prefetch data. – Shai Mar 13 '17 at 05:26
  • Thanks. So there has to be a loop for the total number of images in the batch within the forward method, so that top[0].data[...] outputs the full batch? – Alex Jan 14 '18 at 23:00
  • 1
    @Alex yes. each call of forward must yield a new batch – Shai Jan 15 '18 at 04:22
  • @Shai: that's cool. Do yo know any way to speed up the loop over images in the batch? Something like itertools? – Alex Jan 16 '18 at 13:00
  • 1
    @Alex you may try speed things up using `multiprocessing.Process` to fetch the data. See [this answer](https://stackoverflow.com/a/48065550/1714410) for a sketch. – Shai Jan 16 '18 at 13:02
5

@Shai's answer is great. At the same time, I find another detailed example about python data layer in one PR of caffe-master. https://github.com/BVLC/caffe/pull/3471/files I hope this detailed example be helpful for anyone else.

kli_nlpr
  • 894
  • 2
  • 11
  • 25
  • Thank you very much, do you happen to know how we should configure the prototxt file? actually I am trying to do exactly what you asked, but I am confused. even after looking at the code. my problem is first how we define the image source in the prototxt and then how we read different parameters from it. I would appreciate if you could share your implementation with us. it helps us greatly. – Hossein Feb 08 '18 at 15:02
  • Done it :) Thank you very much for your link. I followed couple of Shais answer and could thanks to dear God, get every thing running :) – Hossein Feb 09 '18 at 19:04