In a numerical physics project of mine, I'd like to compare memory usage of different methods for solving the same problem.
I've found out that I can include <sys/resource.h>
and use getrusage()
to get the maximum amount of used memory in ru_maxrss
(with some caveats that I don't think I need to care about).
For benchmarking, I essentially run code blocks like these for all the different methods I've implemented:
int minN = 6;
int maxN = 16;
std::chrono::steady_clock::time_point start;
std::chrono::steady_clock::time_point finish;
std::cout << "Naive:" << std::endl;
for (int N = minN; N <= maxN; N+=2) {
struct rusage usage{};
start = std::chrono::steady_clock::now();
//do work...
finish = std::chrono::steady_clock::now();
int ret = getrusage(RUSAGE_SELF, &usage);
long time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
long max_ram_byte = usage.ru_maxrss;
std::cout << "N = " << N << ", time = " << time_ns/1e9 << " s, ram = " << max_ram_byte << " KB" << std::endl;
}
Now, the problem is that ru_maxrss
contains the maximum amount of used memory for the whole lifetime of the program, i.e. it is not reduced if a "large" object goes out of scope. Thus, the output of the whole program will look something like this:
Naive:
N = 6, time = 0.022541 s, ram = 8028 KB
N = 8, time = 0.0234674 s, ram = 65360 KB
N = 10, time = 0.373676 s, ram = 135284 KB
N = 12, time = 21.7536 s, ram = 631792 KB
Magnetization:
N = 6, time = 0.000166585 s, ram = 631792 KB
N = 8, time = 0.00158378 s, ram = 631792 KB
N = 10, time = 0.022255 s, ram = 631792 KB
N = 12, time = 0.405172 s, ram = 631792 KB
Momentum:
N = 6, time = 0.000175482 s, ram = 631792 KB
N = 8, time = 0.000766058 s, ram = 631792 KB
N = 10, time = 0.00658272 s, ram = 631792 KB
N = 12, time = 0.0728279 s, ram = 631792 KB
Parity:
N = 8, time = 0.000986243 s, ram = 631792 KB
N = 12, time = 0.0528302 s, ram = 631792 KB
Spin Inversion:
N = 8, time = 0.00111167 s, ram = 631792 KB
N = 12, time = 0.050363 s, ram = 631792 KB
Once memory usage has peaked, the reported memory usage of my benchmark is useless. I realize that, in principle, this is how getrusage()
is supposed to work. Is there a way to reset this metric? Or can anyone recommend another easy way to measure memory usage from inside the program that does not involve using specific benchmarking libraries?
Regards
PS: Does anyone know whether or in which cases ru_maxrss
is in B or KB? For N = 8, I store a matrix with 65536 double
elements. This matrix should dominate memory usage and I'd expect it to take up about 65536 Bytes of memory. My benchmark reports that I use 65360 KB, as the documentation of getrusage()
says the result is in KB. This is eerily close to the estimated number of Bytes I was expecting. So is the result really in KB and this is purely a coincidence?
Update:
I got what I wanted working parsing /proc/self/stat
, I'll share my updated code below in case anyone finds this in the future. Note that rss
, the 24th entry of stat
is in pages, so one must multiply it by 4096 to get an approximation of the used amount of RAM in B.
std::cout << "Naive:" << std::endl;
for (int N = minN; N <= maxN; N+=2) {
start = std::chrono::steady_clock::now();
// do work...
finish = std::chrono::steady_clock::now();
std::ifstream statFile("/proc/self/stat");
std::string statLine;
std::getline(statFile, statLine);
std::istringstream iss(statLine);
std::string entry;
long long memUsage;
for (int i = 1; i <= 24; i++) {
std::getline(iss, entry, ' ');
if (i == 24) {
memUsage = stoi(entry);
}
}
long time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
std::cout << "N = " << N << ", time = " << time_ns/1e9 << " s, ram = " << 4096*memUsage/1e9 << " GB" << std::endl;
}