I'm looking for a way to implement block-diagonal matrices in Tensorflow. Specifically, I have block-diagonal matrix A with N blocks of size S x S each. Further, I have a vector v of length N*S. I want to calculate A dot v. Is there any efficient way to do it in Tensorflow?
Also, I would prefer the implementation which supports a batch dimension of v (e.g. its real dimension is batch_size x (N*S)) and which is memory efficient, keeping in memory only block-diagonal parts of A.
Thanks for any help!