Problems with the provided code example in the question
On my system, your code prints:
-1
Invalid argument
And I don't see how it would work in the first place. madvise
does not allocate memory for you, it it used to set policies for existing memory ranges. Therefore, specifying an uninitialized pointer as the first argument is not gonna work.
There exists documentation for the MADV_HUGEPAGE
argument in the madvise
manual:
Enable Transparent Huge Pages (THP) for pages in the range
specified by addr and length. Currently, Transparent Huge
Pages work only with private anonymous pages (see
mmap(2)). The kernel will regularly scan the areas marked
as huge page candidates to replace them with huge pages.
The kernel will also allocate huge pages directly when the
region is naturally aligned to the huge page size (see
posix_memalign(2)).
How to use permanently reserved huge pages
Here is a rewritten code that uses mmap
instead of mavise
. With that I can reproduce your error of Cannot allocate memory
:
#include <iostream>
#include <sys/mman.h>
int main()
{
const auto memorySize = 16ULL * 1024ULL * 1024ULL;
void* data = mmap(
/* "If addr is NULL, then the kernel chooses the (page-aligned) address at which to create the mapping" */
nullptr,
memorySize,
/* memory protection / permissions */ PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
/* fd should for compatibility be -1 even though it is ignored for MAP_ANONYMOUS */ -1,
/* "The offset argument should be zero [when using MAP_ANONYMOUS]." */ 0
);
if ( data == MAP_FAILED ) {
std::cout << "Failed to allocate memory: " << strerror( errno ) << "\n";
} else {
std::cout << "Allocated pointer at: " << data << "\n";
}
munmap( data, memorySize );
return 0;
}
That error can be solved by actually making the kernel reserve some huge pages that can be allocated. Normally, this should be done during boot time when most memory is unused for better success but in my case, I was able to allocate 37 huge pages with 2 MiB, i.e., 74 MiB of memory. I find that surprisingly low because I have 370 MiB "free" and 3.9 GiB "available" memory. Maybe I should close firefox first and then try to reserve more huge pages or maybe kswapd
can somehow be triggered to defragment memory before reserving more huge pages.
echo 128 | sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
head /sys/kernel/mm/hugepages/hugepages-2048kB/*
Output:
==> /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages <==
37
==> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages <==
37
==> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages_mempolicy <==
37
==> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_overcommit_hugepages <==
0
==> /sys/kernel/mm/hugepages/hugepages-2048kB/resv_hugepages <==
0
==> /sys/kernel/mm/hugepages/hugepages-2048kB/surplus_hugepages <==
0
Now when I run the code snipped with clang++ hugePages.cpp && ./a.out
, I get this output:
Allocated pointer at: 0x7f4454e00000
As can be seen from the trailing zeros, it is aligned to quite a large alignment value of 2 MiB.
How to use transparent huge pages
I have not seen any system actually using these fixed reserved huge pages. It seems that transparent huge pages have superseded that usage. Probably, partly because:
Pages that are used as huge pages are reserved inside the kernel and cannot be used for other purposes. Huge pages cannot be swapped out under memory pressure.
To mitigate these complexities, transparent huge pages were introduced:
No application changes need to be made to take advantage of THP, but interested application developers can try to optimize their use of it. A call to madvise() with the MADV_HUGEPAGE flag will mark a memory range as being especially suited to huge pages, while MADV_NOHUGEPAGE will suggest that huge pages are better used elsewhere. For applications that want to use huge pages, use of posix_memalign() can help to ensure that large allocations are aligned to huge page (2MB) boundaries.
That basically says it all but I think the first statement is not true anymore because most systems nowadays are configured to madvise
in /sys/kernel/mm/transparent_hugepage/enabled
instead of always
, for which the statement probably was intended for. So, here is another try with madvise
:
#include <array>
#include <chrono>
#include <fstream>
#include <iostream>
#include <string_view>
#include <thread>
#include <stdlib.h>
#include <string.h> // streerror
#include <sys/mman.h>
int main()
{
const auto memorySize = 16ULL * 1024ULL * 1024ULL;
void* data{ nullptr };
const auto memalignError = posix_memalign(
&data, /* alignment equal or higher to huge page size */ 2ULL * 1024ULL * 1024ULL, memorySize );
if ( memalignError != 0 ) {
std::cout << "Failed to allocate memory: " << strerror( memalignError ) << "\n";
return 1;
}
std::cout << "Allocated pointer at: " << data << "\n";
if ( madvise( data, memorySize, MADV_HUGEPAGE ) != 0 ) {
std::cerr << "Error on madvise: " << strerror( errno ) << "\n";
return 2;
}
const auto intData = reinterpret_cast<int*>( data );
intData[0] = 3;
/* This access is at offset 3000 * 8 = 24 kB, i.e.,
* still in the same 2 MiB page as the access above */
intData[3000] = 3;
intData[memorySize / sizeof( int ) / 2] = 3;
/* Check whether transparent huge pages have been allocated. */
std::ifstream smapsFile( "/proc/self/smaps" );
std::array<char, 4096> lineBuffer;
while ( smapsFile.good() ) {
/* Getline always appends null. */
smapsFile.getline( lineBuffer.data(), lineBuffer.size(), '\n' );
std::string_view line{ lineBuffer.data() };
if ( line.starts_with( "AnonHugePages:" ) && !line.contains( " 0 kB" ) ) {
std::cout << "We are successfully using transparent huge pages!\n " << line << "\n";
}
}
/* During this sleep /proc/meminfo and /proc/vmstat can be checked for transparent anonymous huge pages. */
using namespace std::chrono_literals;
std::this_thread::sleep_for( 100s );
free( data );
return intData[3000] == 3 ? 0 : 3;
}
Running this with clang++ -std=c++2b hugeTransparentPages.cpp && ./a.out
(C++23 is necessary for the string_view
functionalities like contains
), the output on my system is:
Allocated pointer at: 0x7f38cd600000
We are successfully using transparent huge pages!
AnonHugePages: 4096 kB
And this test was executed while cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
yields 0, i.e., there are no persistently reserved huge pages.
Note that only two pages (4096 kB) out of the requested 16 MiB were actually used because the other pages have not been written to. This is also why the call to madvise
is possible and yields huge pages. It has to be done before the actual physical allocation, i.e., before writing to the allocated memory.
The example code includes a check for transparent huge pages for the process itself. This site lists multiple ways to check the amount of anonymous transparent huge pages that are in use. For example, you can check system-wide with:
grep AnonHugePages /proc/meminfo
What I find interesting is that normally, this is 0 kB
on my system and while the example code with madvise
is running it yields 4096 kB
.
To me, it seems like this means that none of my normally used programs use any persistent huge pages and also no transparent huge pages. I find that very surprising because there should be a lot of use cases for which huge page advantages should outstrip their disadvantages (wasted memory).