12

While this might look like a duplicate from other questions, let me explain why it's not.

I am looking to get a specific part of my application to degrade gracefully when a certain memory limit has been reached. I could have used criteria based on remaining available physical memory, but this wouldn't be safe, because the OS could start paging out memory used by my application before reaching the criteria, which would think there is still some physical memory left, and keep allocating, etc. For the same reason, I can't used the amount of physical memory currently used by the process, because as soon as the OS would start swapping me out, I would keep allocating as the OS pages memory so the number would not grow anymore.

For this reason, I chose a criteria based on the amount of memory allocated by my application, i.e. very close to virtual memory size.

This question (How to determine CPU and memory consumption from inside a process?) provides great ways of querying the amount of virtual memory used by the current process, which I THOUGHT was what I needed.

On Windows, I'm using GetProcessMemoryInfo() and the PrivateUsage field, which works great.

On Linux, I tried several things (listed below) that did not work. The reason why virtual memory usage does not work for me is because of something that happens with OpenCL context creation on NVidia hardware on Linux. The driver reserves a region of the virtual memory space big enough to hold all RAM, all swap and all video memory. My guess is it does so for unified address space and everything. But it also means that the process reports using enormous amounts of memory. On my system for instance, top will report 23.3 Gb in the VIRT column (12 Gb of RAM, 6 Gb of swap, 2 Gb of video memory, which gives 20 Gb reserved by the NVidia driver).

On OSX, by using task_info() and the virtual_size field, I also get a bigger than expected number (a few Gb for an app that takes not even close to 1 Gb on Windows), but not as big as Linux.

So here is the big question: how can I get the amount of memory allocated by my application? I know that this is a somewhat vague question (what does "allocated memory" means?), but I'm flexible:

  • I would prefer to include the application static data, code section and everything, but I can live without.
  • I would prefer to include the memory allocated for stacks, but I can live without.
  • I would prefer to include the memory used by shared libraries, but I can live without.
  • I don't really care for mmap stuff, I can do with or without at that point.
  • Etc.

What is really important is that the number grows with dynamic allocation (new, malloc, anything) and shrinks when the memory is released (which I know can be implementation-dependent).

Things I have tried

Here are a couple of solutions I have tried and/or thought of but that would not work for me.

  1. Read from /proc/self/status

    This is the approach suggested by how-to-determine-cpu-and-memory-consumption-from-inside-a-process. However, as stated above, this returns the amount of virtual memory, which does not work for me.

  2. Read from /proc/self/statm

    Very slightly worst: according to http://kernelnewbies.kernelnewbies.narkive.com/iG9xCmwB/proc-pid-statm-doesnt-match-with-status, which refers to Linux kernel code, the only difference between those two values is that the second one does not substract reserved_vm to the amount of virtual memory. I would have HOPED that reserved_vm would include the memory reserved by the OpenCL driver, but it does not.

  3. Use mallinfo() and the uordblks field

    This does not seem to include all the allocations (I'm guessing the news are missing), since for an +2Gb growth in virtual memory space (after doing some memory-heavy work and still holding the memory), I'm only seeing about 0.1Gb growth in the number returned by mallinfo().

  4. Read the [heap] section size from /proc/self/smaps

    This value started at around 336,760 Kb and peaked at 1,019,496 Kb for work that grew virtual memory space by +2Gb, and then it never gets down, so I'm not sure I can't really rely on this number...

  5. Monitor all memory allocations in my application

    Yes, in an ideal world, I would have control over everybody who allocates memory. However, this is a legacy application, using tons of different allocators, some mallocs, some news, some OS-specific routines, etc. There are some plug-ins that could do whatever they want, they could be compiled with a different compiler, etc. So while this would be great to really control memory, this does not work in my context.

  6. Read the virtual memory size before and after the OpenCL context initialization

    While this could be a "hacky" way to solve the problem (and I might have to fallback to it), I would really wish for a more reliable way to query memory, because OpenCL context could be initialized somewhere out of my control, and other similar but non-OpenCL specific issues could creep in and I wouldn't know about it.

So that's pretty much all I've got. There is one more thing I have not tried yet, because it only works on OSX, but it is to use the approach described in Why does mstats and malloc_zone_statistics not show recovered memory after free?, i.e. use malloc_get_all_zones() and malloc_zone_statistics(), but I think this might be the same problem as mallinfo(), i.e. not take all allocations into account.

So, can anyone suggest a way to query memory usage (as vague of a term as this is, see above for precision) of a given process in Linux (and also OSX even if it's a different method)?

Community
  • 1
  • 1

3 Answers3

3

You can try and use information returned by getrusage():

#include <sys/time.h>
#include <sys/resource.h>

int getrusage(int who, struct rusage *usage);

struct rusage {
    struct timeval ru_utime; /* user CPU time used */
    struct timeval ru_stime; /* system CPU time used */
    long   ru_maxrss;        /* maximum resident set size */
    long   ru_ixrss;         /* integral shared memory size */
    long   ru_idrss;         /* integral unshared data size */
    long   ru_isrss;         /* integral unshared stack size */
    long   ru_minflt;        /* page reclaims (soft page faults) */
    long   ru_majflt;        /* page faults (hard page faults) */
    long   ru_nswap;         /* swaps */
    long   ru_inblock;       /* block input operations */
    long   ru_oublock;       /* block output operations */
    long   ru_msgsnd;        /* IPC messages sent */
    long   ru_msgrcv;        /* IPC messages received */
    long   ru_nsignals;      /* signals received */
    long   ru_nvcsw;         /* voluntary context switches */
    long   ru_nivcsw;        /* involuntary context switches */
};

If the memory information does not fit you purpose, observing the page fault counts can help monitor memory stress, which is what you intend to detect.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    Thanks for the suggestion. I tried it (reading `ru_ixrss`, `ru_idrss` and `ru_isrss`) and all fields have 0. According to the man page, "Not all fields are completed; unmaintained fields are set to zero by the kernel." and "In Linux 2.4 only the fields ru_utime, ru_stime, ru_minflt, and ru_majflt are maintained. Since Linux 2.6, ru_nvcsw and ru_nivcsw are also maintained." – Martin Bisson Jul 21 '16 at 15:40
2

Have you tried a shared library interposer for Linux for section (5) above? So long as your application is not statically linking the malloc functions, you can interpose a new function between your program and the kernel malloc. I've used this tactic many times to collect stats on memory usage.

It does required setting LD_PRELOAD before running the program but no source or binary changes. It is an ideal answer in many cases.

Here is an example of a malloc interposer:

http://www.drdobbs.com/building-library-interposers-for-fun-and/184404926

You probably will also want to do calloc and free. Calls to new generally end up as a call to malloc so C++ is covered as well.

OS X seems to have similar capabilities but I have not tried it.

http://tlrobinson.net/blog/2007/12/overriding-library-functions-in-mac-os-x-the-easy-way-dyld_insert_libraries/

--Matt

Matthew Fisher
  • 2,258
  • 2
  • 14
  • 23
0

Here is what I ended up using. I scan /proc/self/maps and sum the size of all the address ranges meeting my criteria, which is:

  • Only include ranges from inode 0 (i.e. no devices, no mapped file, etc.)
  • Only include ranges that are at least one of readable, writable or executable
  • Only include private memory
    • In my experiments I did not see instances of shared memory from inode 0. Maybe with inter-process shared memory...?

Here is the code for my solution:

size_t getValue()
{
    FILE* file = fopen("/proc/self/maps", "r");
    if (!file)
    {
        assert(0);
        return 0;
    }

    size_t value = 0;

    char line[1024];
    while (fgets(line, 1024, file) != NULL)
    {
        ptrdiff_t start_address, end_address;
        char perms[4];
        ptrdiff_t offset;
        int dev_major, dev_minor;
        unsigned long int inode;
        const int nb_scanned = sscanf(
            line, "%16tx-%16tx %c%c%c%c %16tx %02x:%02x %lu",
            &start_address, &end_address,
            &perms[0], &perms[1], &perms[2], &perms[3],
            &offset, &dev_major, &dev_minor, &inode
            );
        if (10 != nb_scanned)
        {
            assert(0);
            continue;
        }

        if ((inode == 0) &&
            (perms[0] != '-' || perms[1] != '-' || perms[2] != '-') &&
            (perms[3] == 'p'))
        {
            assert(dev_major == 0);
            assert(dev_minor == 0);
            value += (end_address - start_address);
        }
    }

    fclose(file);

    return value;
}

Since this is looping through all the lines in /proc/self/maps, querying memory that way is significantly slower than using "Virtual Memory currently used by current process" from How to determine CPU and memory consumption from inside a process?.

However, it provides an answer much closer to what I need.

Community
  • 1
  • 1