Given a complex square matrix G and a square matrix M, which I can compute quickly from G, I need to calculate efficiently the matrix G' defined by the matrix equation G' ⋅ M⊤ = G in my C++ program. Afterwards the original G can be discarded.
Since my experience with numerical linear algebra is limited I have so far relied on the Armadillo library, which has very nice syntax and can provide good performance. My approach here would then be to take the transpose of both sides of the equation and solve the problem M ⋅ (G')⊤ = G⊤ by calling
using namespace arma;
G = trans(solve(M, trans(G));
But if I have understood the heavily template-based code correctly, this would involve actually performing the transpositions and copying data around the LAPACK routine cgesv. Of course these are O(N2) operations compared to the O(N3) of the actual linear solver, but I'd still rather avoid them in this case.
I have never used LAPACK directly up to now, but would there be the possibility to gain performance by directly calling a LAPACK routine without having to actually carry out the transpositions? Or would you suggest a different approach?