I am constructing the partial derivative of a function in C. The process is mainly consisted of a large number of small loops. Each loop is responsible for filling a column of the matrix. Because the size of the matrix is huge, the code should be written efficiently. I have a number of plans in mind for the implementation which I do not want get into the details.
I know that the smart compilers try to take advantage of the cache automatically. But I would like to know more the details of using cache and writing an efficient code and efficient loops. It is appreciated if provide with some resources or websites so I can know more about writing the efficient codes in terms of reducing memory access time and taking advantage guy.
I know that my request my look sloppy, but I am not a computer guy. I did some research but with no success. So, any help is appreciated.
Thanks