3

Is the Linux mremap(2) function able to change the virtual address of a HugeTLB obtained from mmap() to a new fixed virtual address?

(Background: I want to remap the virtual address based on the physical address of the memory I get. This is to efficiently perform virtual to physical address translations by inspecting pointer addresses directly. I will use the memory for DMA to hardware from userspace.)

This does not seem to work with my simple test program:

#define _GNU_SOURCE
#include <stdio.h>
#include <sys/mman.h>
#include <stdint.h>

#define LARGE_PAGE_SIZE (1024*1024*1024)

int main() {
  void *p1;
  void *p2;
  p1 = mmap(NULL, LARGE_PAGE_SIZE, PROT_READ|PROT_WRITE,
    MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB|MAP_LOCKED,
    0, 0);
  if (p1 == MAP_FAILED) {
perror("mmap");
return 1;
  }
  printf("p1 = %p\n", p1);
  p2 = mremap(p1, LARGE_PAGE_SIZE, LARGE_PAGE_SIZE,
      MREMAP_MAYMOVE|MREMAP_FIXED,
      (void*)(((uint64_t)p1) | 0x500000000000ULL));
  if (p2 == MAP_FAILED) {
perror("mremap");
return 1;
  }
  printf("p2 = %p\n", p2);
}

The mmap() succeeds by the mremap() fails:

$ gcc -o mremap_hugetlb mremap_hugetlb.c && sudo ./mremap_hugetlb
p1 = 0x2aaac0000000
mremap: Invalid argument

Note that the new address is calculated from the one obtained by the original mmap(). This is significant. The desired address is not known ahead of time and so I can't simply pass MAP_FIXED to mmap().

The workaround I currently use is to make the mmap() file-backed so that I can then mmap() it again at a fixed address, and munmap() the old mapping. This is suboptimal because it requires me to find a mounted hugetlbfs filesystem and I don't like the complexity of that dependency.

Current code based on the workaround: https://github.com/lukego/snabbswitch/blob/straightline/src/core/memory.c#L56

Luke Gorrie
  • 467
  • 3
  • 14

2 Answers2

5

Right now it looks like you do have to use hugetlbfs.

Unless I'm mistaken, the problem occurs in the Linux kernel because mm/mremap.c:mremap_to() calls mm/mremap.c:vma_to_resize(), which fails with EINVAL for huge pages.

Perhaps the test is incorrect, or the function lacks code to handle huge pages correctly. I'm wondering if one should contact the linux-kernel and linux-mm mailing lists, to see if this is a bug that should/could be easily fixed. However, that won't help you with users relying on current (and older) kernels.

Remember that when using mmap() on a file descriptor, you usually use a different code path as each file system can specify their own mmap handler. For hugetlbfs, the code is in fs/hugetlbfs/inode.c:hugetlbfs_file_mmap(). And, like you said, that code path seems to work okay for you.

Note that it is best if you let the user configure the hugetlbfs mount point, instead of scanning one from /proc/mounts, as that way the sysadmin can configure multiple hugetlbfs mount points, each with different configuration, for each service running on the server. (I'm hoping your service does not require running as root.)

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • This is still an issue as of May 2016. Also mremap-like behavior can be achieved with fallocate, but hugetlbfs only recently got fallocate support, and only supports pre-allocation and hole punch. Insert and collapse range don't seem to be supported from my reading of the code: http://lxr.free-electrons.com/source/fs/hugetlbfs/inode.c#L547 – Eloff May 12 '16 at 00:26
3

I have found a solution that seems better: POSIX shared memory (shm).

The shm API is able to allocate HugeTLB pages and map them multiple times even when no hugetlbfs filesystem is available. I allocate the HugeTLB with shmget and can then map it any number of times with shmat.

Luke Gorrie
  • 467
  • 3
  • 14