3

I want to use huge page or transparent huge page in my code to optimize the performance of data structure. But when I use the madvise() in my code, it Can allocate memory for me.

There is always [madvise] never in /sys/kernel/mm/transparent_hugepage/enabled.

There is always defer defer+madvise [madvise] never in /sys/kernel/mm/transparent_hugepage/defrag.

#include <iostream>
#include <sys/mman.h>
#include <string.h>

int main()
{
    void* ptr;
    std::cout << madvise(ptr, 1, MADV_HUGEPAGE) << std::endl;
    std::cout << strerror(errno) << std::endl;

    return 0;
}

The result of the above code is:

-1
Cannot allocate memory
Rachid K.
  • 4,490
  • 3
  • 11
  • 30
dy66
  • 51
  • 3

1 Answers1

0

Problems with the provided code example in the question

On my system, your code prints:

-1
Invalid argument

And I don't see how it would work in the first place. madvise does not allocate memory for you, it it used to set policies for existing memory ranges. Therefore, specifying an uninitialized pointer as the first argument is not gonna work.

There exists documentation for the MADV_HUGEPAGE argument in the madvise manual:

Enable Transparent Huge Pages (THP) for pages in the range specified by addr and length. Currently, Transparent Huge Pages work only with private anonymous pages (see mmap(2)). The kernel will regularly scan the areas marked as huge page candidates to replace them with huge pages. The kernel will also allocate huge pages directly when the region is naturally aligned to the huge page size (see posix_memalign(2)).

How to use permanently reserved huge pages

Here is a rewritten code that uses mmap instead of mavise. With that I can reproduce your error of Cannot allocate memory:

#include <iostream>
#include <sys/mman.h>

int main()
{
    const auto memorySize = 16ULL * 1024ULL * 1024ULL;

    void* data = mmap(
        /* "If addr is NULL, then the kernel chooses the (page-aligned) address at which to create the mapping" */
        nullptr,
        memorySize,
        /* memory protection / permissions */ PROT_READ | PROT_WRITE,
        MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
        /* fd should for compatibility be -1 even though it is ignored for MAP_ANONYMOUS */ -1,
        /* "The offset argument should be zero [when using MAP_ANONYMOUS]." */ 0
    );

    if ( data == MAP_FAILED ) {
        std::cout << "Failed to allocate memory: " << strerror( errno ) << "\n";
    } else {
        std::cout << "Allocated pointer at: " << data << "\n";
    }

    munmap( data, memorySize );

    return 0;
}

That error can be solved by actually making the kernel reserve some huge pages that can be allocated. Normally, this should be done during boot time when most memory is unused for better success but in my case, I was able to allocate 37 huge pages with 2 MiB, i.e., 74 MiB of memory. I find that surprisingly low because I have 370 MiB "free" and 3.9 GiB "available" memory. Maybe I should close firefox first and then try to reserve more huge pages or maybe kswapd can somehow be triggered to defragment memory before reserving more huge pages.

echo 128 | sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
head /sys/kernel/mm/hugepages/hugepages-2048kB/*

Output:

==> /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages <==
37
==> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages <==
37
==> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages_mempolicy <==
37
==> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_overcommit_hugepages <==
0
==> /sys/kernel/mm/hugepages/hugepages-2048kB/resv_hugepages <==
0
==> /sys/kernel/mm/hugepages/hugepages-2048kB/surplus_hugepages <==
0

Now when I run the code snipped with clang++ hugePages.cpp && ./a.out, I get this output:

Allocated pointer at: 0x7f4454e00000

As can be seen from the trailing zeros, it is aligned to quite a large alignment value of 2 MiB.

How to use transparent huge pages

I have not seen any system actually using these fixed reserved huge pages. It seems that transparent huge pages have superseded that usage. Probably, partly because:

Pages that are used as huge pages are reserved inside the kernel and cannot be used for other purposes. Huge pages cannot be swapped out under memory pressure.

To mitigate these complexities, transparent huge pages were introduced:

No application changes need to be made to take advantage of THP, but interested application developers can try to optimize their use of it. A call to madvise() with the MADV_HUGEPAGE flag will mark a memory range as being especially suited to huge pages, while MADV_NOHUGEPAGE will suggest that huge pages are better used elsewhere. For applications that want to use huge pages, use of posix_memalign() can help to ensure that large allocations are aligned to huge page (2MB) boundaries.

That basically says it all but I think the first statement is not true anymore because most systems nowadays are configured to madvise in /sys/kernel/mm/transparent_hugepage/enabled instead of always, for which the statement probably was intended for. So, here is another try with madvise:

#include <array>
#include <chrono>
#include <fstream>
#include <iostream>
#include <string_view>
#include <thread>

#include <stdlib.h>
#include <string.h>     // streerror
#include <sys/mman.h>

int main()
{
    const auto memorySize = 16ULL * 1024ULL * 1024ULL;

    void* data{ nullptr };
    const auto memalignError = posix_memalign(
        &data, /* alignment equal or higher to huge page size */ 2ULL * 1024ULL * 1024ULL, memorySize );
    if ( memalignError != 0 ) {
        std::cout << "Failed to allocate memory: " << strerror( memalignError ) << "\n";
        return 1;
    }

    std::cout << "Allocated pointer at: " << data << "\n";

    if ( madvise( data, memorySize, MADV_HUGEPAGE ) != 0 ) {
        std::cerr << "Error on madvise: " << strerror( errno ) << "\n";
        return 2;
    }

    const auto intData = reinterpret_cast<int*>( data );
    intData[0] = 3;
    /* This access is at offset 3000 * 8 = 24 kB, i.e.,
     * still in the same 2 MiB page as the access above */
    intData[3000] = 3;
    intData[memorySize / sizeof( int ) / 2] = 3;

    /* Check whether transparent huge pages have been allocated. */
    std::ifstream smapsFile( "/proc/self/smaps" );
    std::array<char, 4096> lineBuffer;
    while ( smapsFile.good() ) {
        /* Getline always appends null. */
        smapsFile.getline( lineBuffer.data(), lineBuffer.size(), '\n' );
        std::string_view line{ lineBuffer.data() };
        if ( line.starts_with( "AnonHugePages:" ) && !line.contains( " 0 kB" ) ) {
            std::cout << "We are successfully using transparent huge pages!\n    " << line << "\n";
        }
    }

    /* During this sleep /proc/meminfo and /proc/vmstat can be checked for transparent anonymous huge pages. */
    using namespace std::chrono_literals;
    std::this_thread::sleep_for( 100s );

    free( data );

    return intData[3000] == 3 ? 0 : 3;
}

Running this with clang++ -std=c++2b hugeTransparentPages.cpp && ./a.out (C++23 is necessary for the string_view functionalities like contains), the output on my system is:

Allocated pointer at: 0x7f38cd600000
We are successfully using transparent huge pages!
    AnonHugePages:      4096 kB

And this test was executed while cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages yields 0, i.e., there are no persistently reserved huge pages.

Note that only two pages (4096 kB) out of the requested 16 MiB were actually used because the other pages have not been written to. This is also why the call to madvise is possible and yields huge pages. It has to be done before the actual physical allocation, i.e., before writing to the allocated memory.

The example code includes a check for transparent huge pages for the process itself. This site lists multiple ways to check the amount of anonymous transparent huge pages that are in use. For example, you can check system-wide with:

grep AnonHugePages /proc/meminfo

What I find interesting is that normally, this is 0 kB on my system and while the example code with madvise is running it yields 4096 kB.

To me, it seems like this means that none of my normally used programs use any persistent huge pages and also no transparent huge pages. I find that very surprising because there should be a lot of use cases for which huge page advantages should outstrip their disadvantages (wasted memory).

mxmlnkn
  • 1,887
  • 1
  • 19
  • 26