1

Need to write an application in C/C++ on Linux that receives a stream of bytes from a socket and process them. The total bytes could be close to 1TB. If I have unlimited amount memory, I will just put it all in the memory, so my application can easily process data. It's much easy to do many things on flat memory space, such as memmem(), memcmp() ... On a circular buffer, the application has to be extra smart to be aware of the circular buffer.

I have about 8G of memory, but luckily due to locality, my application never needs to go back by more than 1GB from the latest data it received. Is there a way to have a 1TB buffer, with only the latest 1GB data mapped to physical memory? If so, how to do it?

Any ideas? Thanks.

packetie
  • 4,839
  • 8
  • 37
  • 72
  • 2
    Well, you gave the answer yourself? Use a circular buffer. Far easier and cleaner than any strange hack in the memory management. – deviantfan Aug 22 '15 at 22:56
  • 2
    Depends on the OS an architecture. Using a 64-bit Linux or *BSD or Mac OS X,, you can do this easily with an anonymous [`mmap()`](http://man7.org/linux/man-pages/man3/posix_madvise.3.html), discarding data no longer needed with `munmap()`. In Linux, you can set up a terabyte mapping and just clear out unneeded parts with [`madvise(,,MADV_DONTNEED)`](http://man7.org/linux/man-pages/man2/madvise.2.html). – Nominal Animal Aug 22 '15 at 22:59
  • I think @NominalAnimal is right, mmap would be the way to go here. Don't forget to use `MAP_SHARED` and check `ulimit -a` (see [here](https://stackoverflow.com/questions/2909955)). – Wander Nauta Aug 22 '15 at 23:01
  • Thanks @NominalAnimal and @wander-nauta, I like your suggestion about `madvise()`. But in this case, I would need to know how to do `malloc(<1TB>)` without mapping any physical page (otherwise it will fail). – packetie Aug 22 '15 at 23:22
  • @deviantfan, I have used circular buffer approach, the processing can be very complex, flat memory will make super clean. – packetie Aug 22 '15 at 23:23
  • @codingFun With a tiny buffer class, with methods `add(data,length)` and`get(buffer,offset,length)`, I don´t see a "very complex". – deviantfan Aug 22 '15 at 23:27
  • 1
    No, you do not malloc a terabyte. You would use mmap() to map the first gigabyte, or so, of your memory. Then, you keep munmap()ing the first chunk of the mmap() memory, then after munmap()ing it, you mmap() additional space beyond your initial chunk. You will need to be diligent and map and unmap memory in page-sized chunks, for this to work. See the mmap(2) manual page for more information. – Sam Varshavchik Aug 22 '15 at 23:27
  • 1
    ...Linux macros that have been around for awhile for ***[circular buffers](https://www.kernel.org/doc/Documentation/circular-buffers.txt)***. – ryyker Aug 22 '15 at 23:37
  • Thanks @SamVarshavchik for the suggestion, I thought of it. The concern I have is, the application may have to do some `malloc(), free()`, there is a chance it may collide with the buffer I allocated using mmap() in terms of virtual memory. – packetie Aug 23 '15 at 00:14
  • @ryyker, thanks for the link. – packetie Aug 23 '15 at 00:17
  • @SamVarshavchik: In Linux/Unix/*BSDs, you can just `mmap()` the entire terabyte; individual pages are only allocated when you try to access the memory. `madvise(,,MADV_DONTNEED)` in Linux discards the pages, releasing them for other use; on next access, they'll read zero again. On all systems, `malloc()` and `free()` are aware of memory maps, and will not collide. (Most C libraries use anonymous memory maps for large allocations anyway.) – Nominal Animal Aug 23 '15 at 00:25
  • All that said, in practice, I'd probably just use a 2-3 gigabyte buffer, moving the contents down by one gigabyte whenever the data in the first gigabyte is no longer needed. I haven't checked whether [`mremap()`](http://man7.org/linux/man-pages/man2/mremap.2.html)ing is faster than `memmove()`, so I'd check that first, but it's not at all difficult. – Nominal Animal Aug 23 '15 at 00:33
  • @NominalAnimal, I thought about periodically moving data down to the first 1GB too. The problem is, the application built up lots of data structures using pointers (with a good reason). It's impractical to change all the data structures when you shift data in memory. – packetie Aug 23 '15 at 01:50
  • @codingFun: Why would you use pointers instead of offsets from the beginning of the data? – Nominal Animal Aug 23 '15 at 20:22
  • @NominalAnimal, I don't write all the application code :-( I get the data frames and call other developer's code with pointer to a frame of data that I just received. Other developers assume the data frame will be persistent in memory so their code can do things like string comparison between the current data frame and previous data frame (they don't save previous data frame!). My job is to spoil them (so that their coding is simple), hence my challenge. – packetie Aug 24 '15 at 20:48

1 Answers1

1

Here's an example. It sets up a full terabyte mapping, but initially inaccessible (PROT_NONE). You, the programmer, maintain a window that can only extend and move upwards in memory. The example program uses a one and a half gigabyte window, advancing it in steps of 1,023,739,137 bytes (the mapping_use() makes sure the available pages cover at least the desired region), and does actually modify every page in every window, just to be sure.

#define _GNU_SOURCE
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>

typedef struct mapping  mapping;
struct mapping {
    unsigned char   *head;  /* Start of currently accessible region */
    unsigned char   *tail;  /* End of currently accessible region */ 
    unsigned char   *ends;  /* End of region */
    size_t           page;  /* Page size of this mapping */
};

/* Discard mapping.
*/
void mapping_free(mapping *const m)
{
    if (m && m->ends > m->head) {
        munmap(m->head, (size_t)(m->ends - m->head));
        m->head = NULL;
        m->tail = NULL;
        m->ends = NULL;
        m->page = 0;
    }
}

/* Move the accessible part up in memory, to [from..to).
*/
int mapping_use(mapping *const m, void *const from, void *const to)
{
    if (m && m->ends > m->head) {
        unsigned char *const head = ((unsigned char *)from <= m->head) ? m->head :
                                    ((unsigned char *)from >= m->ends) ? m->ends :
                                    m->head + m->page * (size_t)(((size_t)((unsigned char *)from - m->head)) / m->page);
        unsigned char *const tail = ((unsigned char *)to <= head) ? head :
                                    ((unsigned char *)to >= m->ends) ? m->ends :
                                    m->head + m->page * (size_t)(((size_t)((unsigned char *)to - m->head) + m->page - 1) / m->page); 

        if (head > m->head) {
            munmap(m->head, (size_t)(head - m->head));
            m->head = head;
        }

        if (tail > m->tail) {
#ifdef USE_MPROTECT
            mprotect(m->tail, (size_t)(tail - m->tail), PROT_READ | PROT_WRITE);
#else
            void *result;
            do {
                result = mmap(m->tail, (size_t)(tail - m->tail), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_FIXED | MAP_PRIVATE | MAP_NORESERVE, -1, (off_t)0);
            } while (result == MAP_FAILED && errno == EINTR);
            if (result == MAP_FAILED)
                return errno = ENOMEM;
#endif
            m->tail = tail;
        }

        return 0;
    }
    return errno = EINVAL;
}

/* Initialize a mapping.
*/
int mapping_create(mapping *const m, const size_t size)
{
    void  *base;
    size_t page, truesize;

    if (!m || size < (size_t)1)
        return errno = EINVAL;

    m->head = NULL;
    m->tail = NULL;
    m->ends = NULL;
    m->page = 0;

    /* Obtain default page size. */
    {
        long   value = sysconf(_SC_PAGESIZE);
        page = (size_t)value;
        if (value < 1L || (long)page != value)
            return errno = ENOTSUP;
    }

    /* Round size up to next multiple of page. */
    if (size % page)
        truesize = size + page - (size % page);
    else
        truesize = size;

    /* Create mapping. */
    do {
        errno = ENOTSUP;
        base = mmap(NULL, truesize, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, (off_t)0);
    } while (base == MAP_FAILED && errno == EINTR);
    if (base == MAP_FAILED)
        return errno;

    /* Success. */
    m->head = base;
    m->tail = base;
    m->ends = (unsigned char *)base + truesize;
    m->page = page;

    errno = 0;
    return 0;
}

static void memtouch(void *const ptr, const size_t size)
{
    if (ptr && size > 0) {
        unsigned char *mem = (unsigned char *)ptr;
        const size_t   step = 2048;
        size_t         n = size / (size_t)step - 1;

        mem[0]++;
        mem[size-1]++;

        while (n-->0) {
            mem += step;
            mem[0]++;
        }
    }
}

int main(void)
{
    const size_t   size = (size_t)1024 * (size_t)1024 * (size_t)1024 * (size_t)1024;
    const size_t   need = (size_t)1500000000UL;
    const size_t   step = (size_t)1023739137UL;
    unsigned char *base;
    mapping        map;
    size_t         i;

    if (mapping_create(&map, size)) {
        fprintf(stderr, "Cannot create a %zu-byte mapping: %m.\n", size);
        return EXIT_FAILURE;
    }

    printf("Have a %zu-byte mapping at %p to %p.\n", size, (void *)map.head, (void *)map.ends);
    fflush(stdout);


    base = map.head;

    for (i = 0; i <= size - need; i += step) {
        printf("Requesting %p to %p .. ", (void *)(base + i), (void *)(base + i + need));
        fflush(stdout);
        if (mapping_use(&map, base + i, base + i + need)) {
            printf("Failed (%m).\n");
            fflush(stdout);
            return EXIT_FAILURE;
        }
        printf("received %p to %p.\n", (void *)map.head, (void *)map.tail);
        fflush(stdout);
        memtouch(base + i, need);
    }

    mapping_free(&map);

    return EXIT_SUCCESS;
}

The approach is twofold. First, an inaccessible (PROT_NONE) mapping is created to reserve the necessary virtual contiguous address space. If we omit this step, it would make it possible for a malloc() call or similar to acquire pages within this range, which would defeat the entire purpose; a single terabyte-long mapping.

Second, when the accessible window extends into the region, either mprotect() (if USE_MPROTECT is defined), or mmap() is used to make the required pages accessible. Pages no longer needed are completely unmapped.

Compile and run using

gcc -Wall -Wextra -std=c99 example.c -o example
time ./example

or, to use mmap() only once and mprotect() to move the window,

gcc -DUSE_MPROTECT=1 -Wall -Wextra -std=c99 example.c -o example
time ./example

Note that you probably don't want to run the test if you don't have at least 4GB of physical RAM.

On this particular machine (i5-4200U laptop with 4GB of RAM, 3.13.0-62-generic kernel on Ubuntu x86_64), quick testing didn't show any kind of performance difference between mprotect() and mmap(), in execution speed or resident set size.

If anyone bothers to compile and run the above, and finds that one of them has a repeatable benefit/drawback (resident set size or time used), I'd very much like to know about it. Please also define your kernel and CPU used.

I'm not sure which details I should expand on, since this is pretty straightforward, really, and the Linux man pages project man 2 mmap and man 2 mprotect pages are quite descriptive. If you have any questions on this approach or program, I'd be happy to try and elaborate.

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • Thanks @nominal-animal for the sample code. I ran it with and without MPROTECT, no performance difference. Will study it some more. – packetie Aug 25 '15 at 15:28