I got intrigued by the discussion in http://scipy.github.io/old-wiki/pages/PerformanceTips on how to get faster dot computations.
It is concluded dotting C_contiguous matrices should be faster, and the following results are presented
import numpy as np
from time import time
N = 1000000
n = 40
A = np.ones((N,n))
AT_F = np.ones((n,N), order='F')
AT_C = np.ones((n,N), order='C')
>>> t = time();C = np.dot(A.T, A);t1 = time() - t
3.9203271865844727
>>> t = time();C = np.dot(AT_F, A);t2 = time() - t
3.9461679458618164
>>> t = time();C = np.dot(AT_C, A);t3 = time() - t
2.4167969226837158
I tried it as well (Python 3.7) and the final computation, using C_contiguous matrices, is not faster at all!
I get the following results
>>> t1
0.2102820873260498
>>> t2
0.4134488105773926
>>> t3
0.28309035301208496
It turns out the first approach is the fastest.
Where is this discrepancy between their and mine calculations coming from? How can transposing in the first case not slow the calculation down?
Thanks