I am accelerating a model by replacing all its linear algebra operations with cuBlas's functions. And I want to get the time complexity or FLOPs of the model to evaluate its performance in roofline model.
There are two kinds of operation in the model: Gemm and Trsm.
I know the FLOPs of Gemm is about 2 * k * m * n from the question : How to compute the achieved FLOPS of a MPI program which calls cuBlas function:
The standard BLAS GEMM operation is C <- alpha * (A dot B) + beta * C and for A (m by k), B (k by n) and C (m by n), each inner product of a row of A and a column of B multiplied by alpha is 2 * k + 1 flop and there are m * n inner products in A dot B and another 2 * m * n flop for adding beta * C to that dot product. So the total model FLOP count is (2 * k + 3) * (m * n) when alpha and beta are both non-zero.
But for Trsm, I have no idea about its computation complexity. All the documents I found say it's about O(n^3) which isn't clear enough to get the computation complexity.
Sincerely thank you for your answers!