4

I have a bug in a program so that it works fine on 32 bits but works only randomly on 64 bits because of 32 bits pointer truncation somewhere in the program.

The reason is a pointer turns to NULL if it malloc returns a memory address with a bit set in the upper 32 bits on pointer allocation.

So I found the pointer which triggers the segfault. But it’s not a program where I got involved (I’m a user not a developer) and there’s no compiler warnings at all.

So instead of taking time I don´t have, how to just make sure malloc returns a value which can be used in 32 bits mode?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user2284570
  • 2,891
  • 3
  • 26
  • 74
  • There are huge performance benefits from using full 64 bits in the case of this program, so I’d really like not to use x32. – user2284570 Jan 22 '19 at 03:53
  • 3
    What's wrong with [the x32 ABI](https://en.wikipedia.org/wiki/X32_ABI) for your use-case? 32-bit pointers in long mode sounds like exactly what you need, unless you also `mmap` directly to make use of the extra address space, even if you have `malloc` using `mmap(MAP_32BIT)`? Or the program fails to take advantage of 64-bit registers if `long` is a 32-bit type? Or did you actually mean i386, not x32? – Peter Cordes Jan 22 '19 at 04:03
  • @PeterCordes the problem is 32 bits intensive maths and 32 bits intensive disc access. – user2284570 Jan 22 '19 at 04:05
  • 2
    So how does x86-64 with 64-bit pointers help in a way that x32 (ILP32 in 64-bit mode) doesn't? Have you actually tried `gcc -mx32 -O3 -march=native`? Note `-mx32` not `-m32`. – Peter Cordes Jan 22 '19 at 04:05
  • @PeterCordes 64 bits maths and 64 bits disc access system calls. And yes I did. I also have a version of the Intel compiler which doesn t support x32. So things can really get faster through the Intel proprietary autopar. – user2284570 Jan 22 '19 at 04:07
  • 3
    You're not reading what I'm writing. x32 is 64-bit mode. Go read https://en.wikipedia.org/wiki/X32_ABI. It's an ILP32 ABI for 64-bit mode x86-64. uint64_t fits in a single register in x32. It might be *exactly* what you need if you have code that's not 64-bit clean for pointers, but uses lots of 64-bit integer math. (edit in response to your comment edit: unless your compiler doesn't support x32, that's a problem.) If your compiler doesn't support x32, then obviously you haven't tried it... So I think you were confusing x32 (`-mx32`) with crappy old i386 32-bit code (`-m32`) – Peter Cordes Jan 22 '19 at 04:08
  • @PeterCordes yes I read x32 is more general purpose registers than plain i686 and also offer instructions not availaible for i686. And even if it were 64 bits, it s still single threaded because of gcc whereas Intel can perform more powerfull compile optimizations (I don t have the svml runtime for x32). **I mean I am stuck at gcc for x32 mode**. – user2284570 Jan 22 '19 at 04:15
  • 2
    x32 is exactly like regular `-m64` except that pointers, `size_t`, and `long` are 32-bit types. It *is* 64-bit. So the actual obstacle is that you don't have an x32 SVML. I agree that's a showstopper, but please stop being wrong about x32 by saying things like "even if it were 64 bits". Look at the identical asm you get for `uint64_t` a+b from ICC16 and later: https://godbolt.org/z/j8gIuh. Intel's compiler *does* support x32, and SVML is available for x32 ([or at least it was as of version 16.0](https://software.intel.com/en-us/node/628948)), so hopefully you can just go download it. – Peter Cordes Jan 22 '19 at 04:22
  • 1
    Also see [How to use 32-bit pointers in 64-bit application?](https://stackoverflow.com/q/10083744/608639) and [Can you enter x64 32-bit “long compatibility sub-mode” outside of kernel mode?](https://stackoverflow.com/q/12716419/608639) I _thought_ ICC allows you to effectively use a 32-bit address space for 64-bit programs, but I can't find the reference at the moment. On Windows I believe Intel provides a driver for it. Otherwise Peter's [x32 suggestion](https://stackoverflow.com/q/7635013/608639) is probably the best option you have. – jww Jan 22 '19 at 05:22
  • @user2284570 - Another option is, link to the program's {GitHub|BitBucket|etc} and point to the problem. Maybe someone here can drop a patch for you. – jww Jan 22 '19 at 05:31
  • @jww it’s a zip file on an invite only forum. I couldn't even point out the problem. icc allows building for x32 but I don’t just don’t have Intel runtime libraries for x32 with my version (I don’t know where to download them). – user2284570 Jan 22 '19 at 10:25
  • @PeterCordes SVML is supportted on x32 but I simply don’t have the x32 version for linking and I don’t know where to download it. – user2284570 Jan 22 '19 at 10:31
  • You might want to post on the Intel forums to ask where you can download SVML for x32. I googled a bit, but other than that x32 psABI page for ICC 16.0, which seems to be part of some embedded / bi-endian thing, I didn't find anything by googling. – Peter Cordes Jan 22 '19 at 18:34
  • @PeterCordes so I simply admit that allocating a 32 bits pointer is required because 64 bits is the only way forward. A pointer which can be passed to `free()` of course. – user2284570 Jan 22 '19 at 18:36
  • Intel's compilers clearly do support `-mx32`, so it's certainly worth asking on Intel's forums about libraries for it. Maybe you'll hear back before you finish modifying `tcmalloc` to use `MAP_32BIT`. Or take your pick, there are many 3rd-party drop-in malloc replacements and I have no idea if tcmalloc is easier to modify than others. [Is there any better implementation than malloc/calloc for allocating memory in C?](https://stackoverflow.com/q/8892953) lists some – Peter Cordes Jan 22 '19 at 18:44
  • @PeterCordes turns out I m having an additionnal problem https://unix.stackexchange.com/q/620680 about using x32 at all. – user2284570 Nov 20 '20 at 15:14

2 Answers2

3

Best option: get the author of this buggy program to fix their code to be 64-bit clean, or use Linux's x32 ABI (gcc -mx32) for 64-bit registers but 32-bit pointers.


AFAIK, there's no simple way to do exactly what you're asking with regular glibc malloc. Replacing malloc/free with mmap(MAP_32BIT) would probably work, but be horrible for small allocations because it can only allocate in 4k chunks.
mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) / munmap. It's not a drop-in replacement because some code probably needs pointers that are compatible with free.

Or if you can find a custom malloc that uses MAP_32BIT as a workaround for buggy software that isn't 64-bit clean, you could use that as a drop-in replacement. Or modifying a custom-malloc library like tcmalloc to add MAP_32BIT might be a lot easier than building a whole custom glibc.

@Basile Starynkevitch suggests maybe using mmap(MAP_FIXED|MAP_NORESERVE) to map a huge range so all that's left is the low 32 bits. (Then never touch this mapping). The range of virtual memory address in userspace But that will only work if done before any libraries are loaded at high addresses (How to limit a 64-bit process address space to less than 4G? suggests a possible way to prelink to get libraries loaded to low addresses).

MAP_FIXED will replace existing mappings in the range, so probably you want to omit MAP_FIXED and just give a non-NULL hint address and check that it mapped at the address you requested.


If you have code that's not 64-bit clean for pointers, but you still want to take advantage of 64-bit registers for efficient [u]int64_t, Linux's x32 ABI might be exactly what you need.

x32 is an ILP32 ABI for x86-64 long mode, so pointers, size_t, and long are 32-bit types, but 64-bit integer types like long long and uint64_t can use 64-bit registers. The CPU is running in 64-bit long mode, and the ABI uses the same efficient register-args calling convention as the regular x86-64 System V ABI.

Do not confuse x32 with the legacy 32-bit code i386 ABI. They're totally unrelated. x32 is a minor modification to the regular x86-64 ABI.

The usual reason for using x32 is a smaller cache footprint with pointer-heavy data structures, increasing cache hits and saving memory bandwidth.

I don't have the Intel SVML runtime for x32

Intel's compiler and libraries including SVML support x32 from version 16.0 onward, see Intel's x32 psABI Support page. If you have a version older than 16.0, this might be a good reason to upgrade.

(That page seems to say OpenMP might not be supported on x32, at least in version 16.0. That would be a problem if I'm reading that right. Current is 19.01, maybe it's working now.)

Notice that the asm output for a function that adds two uint64_t args is identical for icc -O3 -mx32 and icc -O3 -m64, both using add rdi, rsi / mov rax,rdi. (While ICC is good at auto-vectorizing and auto-parallelizing, apparently it's bad at spotting lea rax, [rdi+rsi] as a peephole optimization, and when using mov doesn't follow its own optimization-manual advice to copy first and then destroy the copy for more efficient mov-elimination.)

But anyway, current versions of Intel's compiler itself definitely support x32; the asm output from C++ shows that uint64_t is unsigned long long instead of unsigned long.


Getting GCC to use SVML:

GCC has a -mveclibabi=svml option which lets it auto-vectorize using SVML functions. So if x32 ICC has a problem using OpenMP to auto-parallelize, you could try GCC.

gcc -fopenmp -O3 -ffast-math -march=native -mveclibabi=svml should probably be good. (-ffast-math is similar to what ICC enables by default.)


Getting regular 64-bit malloc to return 32-bit pointers

Related: an OS X version of the question: How to 'malloc' within first 4GB on x86_64 (no easy way there either).

I don't think glibc malloc has an option for this.

It would be possible to build your own glibc with a minor change, if you can find the mmap call that malloc uses to get new pages from the OS, and add the MAP_32BIT flag to it.

Put the mapping into the first 2 Gigabytes of the process address space. This flag is supported only on x86-64, for 64-bit programs. It was added to allow thread stacks to be allocated somewhere in the first 2 GB of memory, so as to improve context-switch performance on some early 64-bit processors

If you compile a non-PIE executable, the break should be in the low 32 already, so you shouldn't need to stop glibc from using brk() for small allocations.

https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html lists the things you can set with a mallopt call, or environment variables, e.g. M_MMAP_THRESHOLD. Setting that to 4k would get glibc to always use mmap for allocations of that size or larger. But there's no 32-bit option.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Maybe worth mentioning... *"The `MAP_32BIT` flag is ignored when `MAP_FIXED` is set"* per [`mmap(2)` man page](http://man7.org/linux/man-pages/man2/mmap.2.html). – jww Jan 22 '19 at 06:54
  • @jww: Makes sense, the kernel doesn't make any choices about where to allocate in that case. MAP_FIXED disables the fallback to kernel-chosen address if the hint address would cause a collision with existing mappings. So the only way it could apply would be to return an error if passed a fixed address outside the low 32. (Or if the *end* of the mapping extended outside the low 32). – Peter Cordes Jan 22 '19 at 07:04
  • What I’m looking for is rather for icc to not use SVML. But intensive optimizations turns on SVML only code generation. – user2284570 Jan 22 '19 at 10:37
1

The kernel is supposed to support this using various personality flags: ADDR_LIMIT_32BIT, ADDR_LIMIT_3GB, PER_LINUX32, PER_LINUX32_3GB, PER_LINUX_32BIT. The setarch linux32 -B command calls personality(PER_LINUX32|ADDR_LIMIT_32BIT), but this request is ignored by the kernel on x86-64:

$ setarch x86_64 -B grep stack /proc/self/maps  
7fff38461000-7fff38482000 rw-p 00000000 00:00 0                          [stack]

I think this was only implemented for other 64-bit architectures to support porting 32-bit software with pointer truncation problems.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
  • Hemm yes, but my primary objective is to get the program working fast instead of taking days without taking a lot time. I fear creating an allocator would consume more time than fixing the actual error. – user2284570 Jan 25 '19 at 23:58
  • Thanks for posting this; I'd wondered if that `personality` stuff did anything, and what it was for! Some other 64-bit ISAs, like MIPS, extend to 64-bit *without* introducing a new mode. It would make a lot more sense there to be sure to distinguish whether user-space is running code that's 64-bit-aware or not, because the kernel *doesn't* need to do an equivalent of x86's using a different CS value to select 32 or 64-bit mode for user-space. – Peter Cordes Jan 26 '19 at 11:20
  • (On MIPS, they extended the ISA so add/sub etc. naturally keep register values sign-extended to 64-bit, adding `dadd` double-word add instructions. Shift instructions do explicitly sign-extend their 32-bit result to 64 bits. See [MOVZX missing 32 bit register to 64 bit register](//stackoverflow.com/q/51387571) for some comparison between MIPS64 and x86-64, it's an interesting ISA design choice, especially given that MIPS had so much opcode coding space left to add doubleword versions of many instructions. But I think AMD64 is a better choice: implicit zero extension is handy.) – Peter Cordes Jan 26 '19 at 11:23
  • 2
    @user2284570 The problem is that without kernel help, global variables in shared objects will be mapped high in the address space. You can force heap addresses to the lower 4 GiB with a custom allocator, but the addresses of global variables will not be affected by this. – Florian Weimer Jan 26 '19 at 11:23
  • @FlorianWeimer glibc is the only shared object and is used for performing console writing and ioctls and fread/fwrite outside memory management. – user2284570 Jan 26 '19 at 16:37
  • @FlorianWeimer things changed a bit with newer Macintosh . Does aarch64 supports this? – user2284570 Nov 20 '20 at 15:08