3

I need to open a huge memory map. The file is one terabyte.

I however am getting an errno: ENOMEM 12 Cannot allocate memory. I don't get what is holding me up. Requesting the RLIMIT_AS results in the values: 18446744073709551615. Which is enough. My system is also 64 bit so it is not that my virtual memory is too small. ulimit -v is ulimited

I created the data with python using np.lib.format.open_memmap thus it is physically possible. I'm trying to read it in C. Python reading is no problem, numpy.load('terabytearray.npy', mmap_mode='r') works.


Here is a minimal example.

Create a numpy array as such:

import numpy as np

shape = (75000, 5000000)
filename = 'datafile.obj'

if __name__ == '__main__':
  arr = np.lib.format.open_memmap(filename, mode='w+', dtype=np.float32, shape=shape)

read it as such:

#include <stdbool.h>
#include <assert.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>

#include <sys/time.h>
#include <sys/resource.h>

#include <stdio.h>
#include <errno.h>

typedef enum {
  CNPY_LE, /* little endian (least significant byte to most significant byte) */
  CNPY_BE, /* big endian (most significant byte to least significant byte) */
  CNPY_NE, /* no / neutral endianness (each element is a single byte) */
  /* Host endianness is not supported because it is an incredibly bad idea to
     use it for storage. */
} cnpy_byte_order;

typedef enum {
  CNPY_B = 0, /* We want to use the values as index to the following arrays. */
  CNPY_I1,
  CNPY_I2,
  CNPY_I4,
  CNPY_I8,
  CNPY_U1,
  CNPY_U2,
  CNPY_U4,
  CNPY_U8,
  CNPY_F4,
  CNPY_F8,
  CNPY_C8,
  CNPY_C16,
} cnpy_dtype;

typedef enum {
  CNPY_C_ORDER,       /* C order (row major) */
  CNPY_FORTRAN_ORDER, /* Fortran order (column major) */
} cnpy_flat_order;

typedef enum {
  CNPY_SUCCESS,      /* success */
  CNPY_ERROR_FILE,   /* some error regarding handling of a file */
  CNPY_ERROR_MMAP,   /* some error regarding mmaping a file */
  CNPY_ERROR_FORMAT, /* file format error while reading some file */
} cnpy_status;

#define CNPY_MAX_DIM 4
typedef struct {
  cnpy_byte_order byte_order;
  cnpy_dtype dtype;
  cnpy_flat_order order;
  size_t n_dim;
  size_t dims[CNPY_MAX_DIM];
  char *raw_data;
  size_t data_begin;
  size_t raw_data_size;
} cnpy_array;

cnpy_status cnpy_open(const char * const fn, bool writable, cnpy_array *arr) {
  assert(arr != NULL);

  cnpy_array tmp_arr;

  /* open, mmap, and close the file */
  int fd = open(fn, writable? O_RDWR : O_RDONLY);
  if (fd == -1) {
    return CNPY_ERROR_FILE;
  }
  size_t raw_data_size = (size_t) lseek(fd, 0, SEEK_END);
  lseek(fd, 0, SEEK_SET);
  printf("%lu\n", raw_data_size);
  if (raw_data_size == 0) {
    close(fd); /* no point in checking for errors */
    return CNPY_ERROR_FORMAT;
  }
  if (raw_data_size == SIZE_MAX) {
    /* This is just because the author is too lazy to check for overflow on every pos+1 calculation. */
    close(fd);
    return CNPY_ERROR_FORMAT;
  }

  void *raw_data = mmap(
    NULL,
    raw_data_size,
    PROT_READ | PROT_WRITE,
    writable? MAP_SHARED : MAP_PRIVATE,
    fd,
    0 
  );

  if (raw_data == MAP_FAILED) {
    close(fd);
    return CNPY_ERROR_MMAP;
  }

  if (close(fd) != 0) {
    munmap(raw_data, raw_data_size);
    return CNPY_ERROR_FILE;
  }

  /* parse the file */
  // cnpy_status status = cnpy_parse(raw_data, raw_data_size, &tmp_arr); // library call ignore
  // if (status != CNPY_SUCCESS) {
  //   munmap(raw_data, raw_data_size);
  //   return status;
  // }
  // *arr = tmp_arr;

  return CNPY_SUCCESS;
}

int main(){

  cnpy_array arr = {};
  cnpy_status status = cnpy_open("datafile.obj", false, &arr);

  printf("status %i\n",(int) status);
  if(status != CNPY_SUCCESS){
    printf("failure\n");
    printf("errno %i\n", errno);
  }


    struct rlimit lim;
  printf("getrlimit RLIMIT_AS %s\n", (getrlimit(RLIMIT_AS, &lim) == 0 ? "success" : "failure") );
  printf("lim.rlim_cur %lu\n", lim.rlim_cur );
  printf("lim.rlim_max %lu\n", lim.rlim_max );
  printf("RLIM_INFINITY; %lu\n", RLIM_INFINITY );


  return 0;
}

compile with

gcc -std=c11 -o mmap_testing main.c

I'm using ~quf/cnpy library, I included the relevant parts, to make it work with the numpy stuff.

Tarick Welling
  • 3,119
  • 3
  • 19
  • 44
  • *I created the data with python using `np.lib.format.open_memmap` thus it is physically possible.* How did you verify that `nb.lib.format.open_memmap` mapped the entire 1 TB all at once? – Andrew Henle Jan 24 '23 at 21:03
  • 2
    To judge what you might be doing wrong and to avoid (some) unhelpful suggestions, we need to see a [mre] demonstrating what you are trying to do and how you are evoking (and detecting) the error you describe. – John Bollinger Jan 24 '23 at 21:10
  • @JohnBollinger fixed it, example included – Tarick Welling Jan 24 '23 at 21:49
  • @AndrewHenle I called it as in my (now added) example, and how else would it do it? There is no problem calling `numpy.load('terabytearray.npy', mmap_mode='r')` – Tarick Welling Jan 24 '23 at 21:50
  • The protection mode of the mapping must be consistent with the open mode of the file. In your example program, you will open the file read-only, but attempt to create a writable mapping on top of it. That you use `MAP_PRIVATE` does not rescue this situation. There may be other issues, but that one jumped out at me pretty quickly. – John Bollinger Jan 24 '23 at 21:55
  • 3
    @TarickWelling That is not a promise that numpy uses mmap in one contiguous segment. It _could_ be mapping it in, say, 64 MB segments as it goes. In fact, the [documentation](https://numpy.org/doc/stable/reference/generated/numpy.memmap.html#numpy.memmap) implies as much: `Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory.` – Max Jan 24 '23 at 21:56
  • Also, if you want to check `errno` then you must do so immediately after the failing function call returns (and it only makes sense if the function is documented to set `errno` when it fails). There is no guarantee that `errno` will go unmodified by other functions you call afterward, even if they succeed. – John Bollinger Jan 24 '23 at 21:59
  • @Max sure, but it is. numpy mmap (https://github.com/numpy/numpy/blob/2303556949b96c4220ed86fa4554f6a87dec3842/numpy/core/memmap.py#L268) is implemented with `import mmap` wich on unix will mean the same mmap function. – Tarick Welling Jan 24 '23 at 22:14
  • @JohnBollinger The C code posted appears to come from https://sr.ht/%7Equf/cnpy/. It appears to be a header-only C implementation (?!?!?) and it looks to be where code such as `(size_t) lseek(fd, 0, SEEK_END);` comes from (for other readers: that's what `fstat()` is for...). – Andrew Henle Jan 24 '23 at 22:15
  • 1
    @AndrewHenle yes it is that library, I linked it at the bottom of my question. The code is terrible and doesn't compile with C++ until you split it properly but at least it uses `mmap` instead of reading the whole file which is what I needed :) – Tarick Welling Jan 24 '23 at 22:20

2 Answers2

4

Memory-mapping a file read-only with mmap(... PROT_READ | PROT_WRITE ,MAP_PRIVATE, ...) will result in the need to reserve anonymous backing store (or swap space) for the mmap() operation. This is because an application could modify any or even all of the data after it's mapped, which the kernel must then have somewhere to put it (backing store or swap space) if it needs to be swapped out.

If a file is mapped with read/write permissions and mmap(..., MAP_SHARED, ...), the file itself is the "backing store" because if any data needs to be swapped out, it can be written to the file itself. Thus a MAP_SHARED mapping does not need any swap space reservations.

It's also theoretically possible for a mmap(..., PROT_READ, MAP_PRIVATE, ...) mapping to be done without a need to reserve swap space, as the data can't be modified by the process and could just be reread from the file if it needs to be swapped out, but that depends on the interpretation of what "read-only" means - the data in the file when it was first read from disk, or whatever it might be later if it needs to be reread after being swapped out.

Neither Linux:

It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.

nor POSIX specify the behavior:

It is unspecified whether modifications to the underlying object done after the MAP_PRIVATE mapping is established are visible through the MAP_PRIVATE mapping.

The Linux man page does state the following for MAP_PRIVATE:

Create a private copy-on-write mapping. Updates to the mapping are not visible to other processes mapping the same file

The use of the phrase "copy-on-write mapping" implies that Linux will always reserve swap space for a MAP_PRIVATE mapping (but this remains untested...).

MAP_NORESERVE

Note also, Linux provides the MAP_NORESERVE option:

Do not reserve swap space for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify the mapping. When swap space is not reserved one might get SIGSEGV upon a write if no physical memory is available. See also the discussion of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before 2.6, this flag had effect only for private writable mappings.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
2

A problem could be the setting of /proc/sys/vm/overcommit_memory. Full explanation to be found in this answer.

Essentially the kernel uses a heuristic instead of the theoretical idea and thus disallows it. Setting the value to 1 fixes it.

Tarick Welling
  • 3,119
  • 3
  • 19
  • 44
  • Try mapping the file read/write. Mapping the file read-only means anonymous backing store for possible changes to the data after it's mapped needs to be reserved. If you map it read/write, the file itself is the backing store and anonymous backing store (or swap space) does not need to be reserved. – Andrew Henle Jan 24 '23 at 22:21
  • 1
    @AndrewHenle I modified it into the same construction as the one below, thus `writable ? PROT_READ | PROT_WRITE : PROT_READ` and it works, I'm speechless, what an utter useless error message this is, please add this as an answer so I can accept. This is the proper solution, not what I linked (altough it also works) – Tarick Welling Jan 24 '23 at 22:36
  • 1
    I've composed an answer as you requested. The problem from the application's point of view is that it just sees a failed call to `mmap()` with whatever `errno` the kernel was kind enough to supply. – Andrew Henle Jan 24 '23 at 23:02