Best option: get the author of this buggy program to fix their code to be 64-bit clean, or use Linux's x32 ABI (gcc -mx32
) for 64-bit registers but 32-bit pointers.
AFAIK, there's no simple way to do exactly what you're asking with regular glibc malloc. Replacing malloc
/free
with mmap(MAP_32BIT)
would probably work, but be horrible for small allocations because it can only allocate in 4k chunks.
mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0)
/ munmap
. It's not a drop-in replacement because some code probably needs pointers that are compatible with free
.
Or if you can find a custom malloc
that uses MAP_32BIT
as a workaround for buggy software that isn't 64-bit clean, you could use that as a drop-in replacement. Or modifying a custom-malloc library like tcmalloc
to add MAP_32BIT
might be a lot easier than building a whole custom glibc.
@Basile Starynkevitch suggests maybe using mmap(MAP_FIXED|MAP_NORESERVE)
to map a huge range so all that's left is the low 32 bits. (Then never touch this mapping). The range of virtual memory address in userspace But that will only work if done before any libraries are loaded at high addresses (How to limit a 64-bit process address space to less than 4G? suggests a possible way to prelink to get libraries loaded to low addresses).
MAP_FIXED
will replace existing mappings in the range, so probably you want to omit MAP_FIXED and just give a non-NULL hint address and check that it mapped at the address you requested.
If you have code that's not 64-bit clean for pointers, but you still want to take advantage of 64-bit registers for efficient [u]int64_t
, Linux's x32 ABI might be exactly what you need.
x32 is an ILP32 ABI for x86-64 long mode, so pointers, size_t
, and long
are 32-bit types, but 64-bit integer types like long long
and uint64_t
can use 64-bit registers. The CPU is running in 64-bit long mode, and the ABI uses the same efficient register-args calling convention as the regular x86-64 System V ABI.
Do not confuse x32 with the legacy 32-bit code i386 ABI. They're totally unrelated. x32 is a minor modification to the regular x86-64 ABI.
The usual reason for using x32 is a smaller cache footprint with pointer-heavy data structures, increasing cache hits and saving memory bandwidth.
I don't have the Intel SVML runtime for x32
Intel's compiler and libraries including SVML support x32 from version 16.0 onward, see Intel's x32 psABI Support page. If you have a version older than 16.0, this might be a good reason to upgrade.
(That page seems to say OpenMP might not be supported on x32, at least in version 16.0. That would be a problem if I'm reading that right. Current is 19.01, maybe it's working now.)
Notice that the asm output for a function that adds two uint64_t
args is identical for icc -O3 -mx32
and icc -O3 -m64
, both using add rdi, rsi
/ mov rax,rdi
. (While ICC is good at auto-vectorizing and auto-parallelizing, apparently it's bad at spotting lea rax, [rdi+rsi]
as a peephole optimization, and when using mov
doesn't follow its own optimization-manual advice to copy first and then destroy the copy for more efficient mov-elimination.)
But anyway, current versions of Intel's compiler itself definitely support x32; the asm output from C++ shows that uint64_t
is unsigned long long
instead of unsigned long
.
Getting GCC to use SVML:
GCC has a -mveclibabi=svml
option which lets it auto-vectorize using SVML functions. So if x32 ICC has a problem using OpenMP to auto-parallelize, you could try GCC.
gcc -fopenmp -O3 -ffast-math -march=native -mveclibabi=svml
should probably be good. (-ffast-math
is similar to what ICC enables by default.)
Getting regular 64-bit malloc
to return 32-bit pointers
Related: an OS X version of the question: How to 'malloc' within first 4GB on x86_64 (no easy way there either).
I don't think glibc malloc has an option for this.
It would be possible to build your own glibc with a minor change, if you can find the mmap
call that malloc
uses to get new pages from the OS, and add the MAP_32BIT
flag to it.
Put the mapping into the first 2 Gigabytes of the process
address space. This flag is supported only on x86-64, for
64-bit programs. It was added to allow thread stacks to be
allocated somewhere in the first 2 GB of memory, so as to
improve context-switch performance on some early 64-bit
processors
If you compile a non-PIE executable, the break should be in the low 32 already, so you shouldn't need to stop glibc from using brk()
for small allocations.
https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html lists the things you can set with a mallopt
call, or environment variables, e.g. M_MMAP_THRESHOLD
. Setting that to 4k would get glibc to always use mmap for allocations of that size or larger. But there's no 32-bit option.