I am a little confused with the difference between conv2d and conv3d functions. For example, if I have a stack of N images with H height and W width, and 3 RGB channels. The input to the network can be two forms form1: (batch_size, N, H, W, 3) this is a rank 5 tensor form2: (batch_size, H, W, 3N ) this is a rank 4 tensor
The question is, if I apply conv3d with M filters with size (N,3,3) to form1 and apply conv2d with M filters with size (3,3)
Do they have basicly the same feature operations? I think both of these forms convolve in temporal and spatial dimension.
I really appreciate if anyone can help me figure this out.