Parallelize code that uses struct having pointer to pointer type elements using CUDA

Question

If I have a code which takes struct variable as input and manipulate it's elements, how can I parallelize this using CUDA?

void BackpropagateLayer(NET* Net, LAYER* Upper, LAYER* Lower)
{
  INT  i,j;
  REAL Out, Err;

  for (i=1; i<=Lower->Units; i++) {
    Out = Lower->Output[i];
    Err = 0;
    for (j=1; j<=Upper->Units; j++) {
      Err += Upper->Weight[j][i] * Upper->Error[j];
    }
    Lower->Error[i] = Net->Gain * Out * (1-Out) * Err;
  }
}
Where NET and LAYER are structs defined as:
typedef struct {                     /* A LAYER OF A NET:                     */
        INT           Units;         /* - number of units in this layer       */
        REAL*         Output;        /* - output of ith unit                  */
        REAL*         Error;         /* - error term of ith unit              */
        REAL**        Weight;        /* - connection weights to ith unit      */
        REAL**        WeightSave;    /* - saved weights for stopped training  */
        REAL**        dWeight;       /* - last weight deltas for momentum     */
} LAYER;
typedef struct {                     /* A NET:                                */
        LAYER**       Layer;         /* - layers of this net                  */
        LAYER*        InputLayer;    /* - input layer                         */
        LAYER*        OutputLayer;   /* - output layer                        */
        REAL          Alpha;         /* - momentum factor                     */
        REAL          Eta;           /* - learning rate                       */
        REAL          Gain;          /* - gain of sigmoid function            */
        REAL          Error;         /* - total net error                     */
} NET;

What I could think of is to first convert the 2d Weight into 1d. And then send it to kernel to take the product or just use the CUBLAS library. Any suggestions?

possible duplicate of [this](http://stackoverflow.com/questions/9309195/copying-a-struct-containing-pointers-to-cuda-device) — Sagar Masuti, Nov 11 '13 at 06:05
@SagarMasuti Perhaps not totally a duplicate? He has also to deal with passing double pointers, see [cuda 2D array problem](http://stackoverflow.com/questions/6137218/cuda-2d-array-problem), i think? — Vitality, Nov 11 '13 at 06:29
Anyway, following talonmies' answer to the post I linked to, it seems it would be recommendable flattening `Weight` to a linear array, as supposed also by the poster. Concerning the use of cuBLAS, neural network training with backpropagation usually employes cuBLAS internally. — Vitality, Nov 11 '13 at 09:01

score 1 · Answer 1 · answered Nov 11 '13 at 20:54

If you are implementing your own neural network library then for simple cases (nets with fully connected or sparse layers) I strongly recommend using CUBLAS/CUSPARSE. In such case, all 3 basic linear operations can be elegantly expressed using calls to those libraries:

Feed forward: gemv (gemm in case mini-batch size > 1)
Back prop: gemv (gemm in case mini-batch size > 1) with appropriate transpose flags.
Weight updates: ger (gemm in case mini-batch size > 1).

Momentum can be represented using 3 basic operations (or a separate kernel for better perf). Things will get much more interesting when you move beyond basic stuff and start adding things like convolutional layers and so on. In neural nets you have a gazillion of hyper-parameters so I would suggest looking at some existing implementation on how to design your library (like convnet).

Parallelize code that uses struct having pointer to pointer type elements using CUDA

1 Answers1