I have an R package with C compiled code that's been relatively stable for quite a while and is frequently tested against a broad variety of platforms and compilers (windows/osx/debian/fedora gcc/clang).
More recently a new platform was added to test the package again:
Logs from checks with gcc trunk aka 10.0.1 compiled from source
on Fedora 30. (For some archived packages, 10.0.0.)
x86_64 Fedora 30 Linux
FFLAGS="-g -O2 -mtune=native -Wall -fallow-argument-mismatch"
CFLAGS="-g -O2 -Wall -pedantic -mtune=native -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -fstack-clash-protection -fcf-protection"
CXXFLAGS="-g -O2 -Wall -pedantic -mtune=native -Wno-ignored-attributes -Wno-deprecated-declarations -Wno-parentheses -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -fstack-clash-protection -fcf-protection"
At which point the compiled code promptly started segfaulting along these lines:
*** caught segfault ***
address 0x1d00000001, cause 'memory not mapped'
I've been able to reproduce the segfault consistently by using the rocker/r-base
docker container with gcc-10.0.1
with optimization level -O2
. Running a lower optimization gets rid of the problem. Running any other set-up, including under valgrind (both -O0 and -O2), UBSAN (gcc/clang), shows no problems at all. I'm also reasonably sure this ran under gcc-10.0.0
, but don't have the data.
I ran the gcc-10.0.1 -O2
version with gdb
and noticed something that seems odd to me:
While stepping through the highlighted section it appears the initialization of the second elements of the arrays is skipped (R_alloc
is a wrapper around malloc
that self garbage collects when returning control to R; the segfault happens before return to R). Later, the program crashes when the un-initialized element (in the gcc.10.0.1 -O2 version) is accessed.
I fixed this by explicitly initializing the element in question everywhere in the code that eventually led to the usage of the element, but it really should have been initialized to an empty string, or at least that's what I would have assumed.
Am I missing something obvious or doing something stupid? Both are reasonably likely as C is my second language by far. It's just strange that this just cropped up now, and I can't figure out what the compiler is trying to do.
UPDATE: Instructions to reproduce this, although this will only reproduce so long as debian:testing
docker container has gcc-10
at gcc-10.0.1
. Also, don't just run these commands if you don't trust me.
Sorry this is not a minimal reproducible example.
docker pull rocker/r-base
docker run --rm -ti --security-opt seccomp=unconfined \
rocker/r-base /bin/bash
apt-get update
apt-get install gcc-10 gdb
gcc-10 --version # confirm 10.0.1
# gcc-10 (Debian 10-20200222-1) 10.0.1 20200222 (experimental)
# [master revision 01af7e0a0c2:487fe13f218:e99b18cf7101f205bfdd9f0f29ed51caaec52779]
mkdir ~/.R
touch ~/.R/Makevars
echo "CC = gcc-10
CFLAGS = -g -O2 -Wall -pedantic -mtune=native -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -fstack-clash-protection -fcf-protection
" >> ~/.R/Makevars
R -d gdb --vanilla
Then in the R console, after typing run
to get gdb
to run the program:
f.dl <- tempfile()
f.uz <- tempfile()
github.url <- 'https://github.com/brodieG/vetr/archive/v0.2.8.zip'
download.file(github.url, f.dl)
unzip(f.dl, exdir=f.uz)
install.packages(
file.path(f.uz, 'vetr-0.2.8'), repos=NULL,
INSTALL_opts="--install-tests", type='source'
)
# minimal set of commands to segfault
library(vetr)
alike(pairlist(a=1, b="character"), pairlist(a=1, b=letters))
alike(pairlist(1, "character"), pairlist(1, letters))
alike(NULL, 1:3) # not a wild card at top level
alike(list(NULL), list(1:3)) # but yes when nested
alike(list(NULL, NULL), list(list(list(1, 2, 3)), 1:25))
alike(list(NULL), list(1, 2))
alike(list(), list(1, 2))
alike(matrix(integer(), ncol=7), matrix(1:21, nrow=3))
alike(matrix(character(), nrow=3), matrix(1:21, nrow=3))
alike(
matrix(integer(), ncol=3, dimnames=list(NULL, c("R", "G", "B"))),
matrix(1:21, ncol=3, dimnames=list(NULL, c("R", "G", "B")))
)
# Adding tests from docs
mx.tpl <- matrix(
integer(), ncol=3, dimnames=list(row.id=NULL, c("R", "G", "B"))
)
mx.cur <- matrix(
sample(0:255, 12), ncol=3, dimnames=list(row.id=1:4, rgb=c("R", "G", "B"))
)
mx.cur2 <-
matrix(sample(0:255, 12), ncol=3, dimnames=list(1:4, c("R", "G", "B")))
alike(mx.tpl, mx.cur2)
Inspecting in gdb pretty quickly shows (if I understand correctly) that
CSR_strmlen_x
is trying to access the string that was not initialized.
UPDATE 2: this is a highly recursive function, and on top of that the string initialization bit gets called many, many times. This is mostly b/c I was being lazy, we only need the strings initialized for the one time we actually encounter something we want to report in the recursion, but it was easier to initialize every time it is possible to encounter something. I mention this because what you'll see next shows multiple initializations, but only one of them (presumably the one with address <0x1400000001>) is being used.
I can't guarantee that the stuff I'm showing here is directly related to the element that caused the segfault (though it is the same illegal address acccess), but as @nate-eldredge asked it does show that the array element is not initialized either just before return or just after return in the calling function. Note the calling function is initializing 8 of these, and I show them all, with all them filled with either garbage or inaccessible memory.
UPDATE 3, disassembly of function in question:
Breakpoint 1, ALIKEC_res_strings_init () at alike.c:75
75 return res;
(gdb) p res.current[0]
$1 = 0x7ffff46a0aa5 "%s%s%s%s"
(gdb) p res.current[1]
$2 = 0x1400000001 <error: Cannot access memory at address 0x1400000001>
(gdb) disas /m ALIKEC_res_strings_init
Dump of assembler code for function ALIKEC_res_strings_init:
53 struct ALIKEC_res_strings ALIKEC_res_strings_init() {
0x00007ffff4687fc0 <+0>: endbr64
54 struct ALIKEC_res_strings res;
55
56 res.target = (const char **) R_alloc(5, sizeof(const char *));
0x00007ffff4687fc4 <+4>: push %r12
0x00007ffff4687fc6 <+6>: mov $0x8,%esi
0x00007ffff4687fcb <+11>: mov %rdi,%r12
0x00007ffff4687fce <+14>: push %rbx
0x00007ffff4687fcf <+15>: mov $0x5,%edi
0x00007ffff4687fd4 <+20>: sub $0x8,%rsp
0x00007ffff4687fd8 <+24>: callq 0x7ffff4687180 <R_alloc@plt>
0x00007ffff4687fdd <+29>: mov $0x8,%esi
0x00007ffff4687fe2 <+34>: mov $0x5,%edi
0x00007ffff4687fe7 <+39>: mov %rax,%rbx
57 res.current = (const char **) R_alloc(5, sizeof(const char *));
0x00007ffff4687fea <+42>: callq 0x7ffff4687180 <R_alloc@plt>
58
59 res.target[0] = "%s%s%s%s";
0x00007ffff4687fef <+47>: lea 0x1764a(%rip),%rdx # 0x7ffff469f640
0x00007ffff4687ff6 <+54>: lea 0x18aa8(%rip),%rcx # 0x7ffff46a0aa5
0x00007ffff4687ffd <+61>: mov %rcx,(%rbx)
60 res.target[1] = "";
61 res.target[2] = "";
0x00007ffff4688000 <+64>: mov %rdx,0x10(%rbx)
62 res.target[3] = "";
0x00007ffff4688004 <+68>: mov %rdx,0x18(%rbx)
63 res.target[4] = "";
0x00007ffff4688008 <+72>: mov %rdx,0x20(%rbx)
64
65 res.tar_pre = "be";
66
67 res.current[0] = "%s%s%s%s";
0x00007ffff468800c <+76>: mov %rax,0x8(%r12)
0x00007ffff4688011 <+81>: mov %rcx,(%rax)
68 res.current[1] = "";
69 res.current[2] = "";
0x00007ffff4688014 <+84>: mov %rdx,0x10(%rax)
70 res.current[3] = "";
0x00007ffff4688018 <+88>: mov %rdx,0x18(%rax)
71 res.current[4] = "";
0x00007ffff468801c <+92>: mov %rdx,0x20(%rax)
72
73 res.cur_pre = "is";
74
75 return res;
=> 0x00007ffff4688020 <+96>: lea 0x14fe0(%rip),%rax # 0x7ffff469d007
0x00007ffff4688027 <+103>: mov %rax,0x10(%r12)
0x00007ffff468802c <+108>: lea 0x14fcd(%rip),%rax # 0x7ffff469d000
0x00007ffff4688033 <+115>: mov %rbx,(%r12)
0x00007ffff4688037 <+119>: mov %rax,0x18(%r12)
0x00007ffff468803c <+124>: add $0x8,%rsp
0x00007ffff4688040 <+128>: pop %rbx
0x00007ffff4688041 <+129>: mov %r12,%rax
0x00007ffff4688044 <+132>: pop %r12
0x00007ffff4688046 <+134>: retq
0x00007ffff4688047: nopw 0x0(%rax,%rax,1)
End of assembler dump.
UPDATE 4:
So, trying to parse through the standard here are the parts of it that seem relevant (C11 draft):
6.3.2.3 Par7 Conversions > Other Operands > Pointers
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined.
Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type,the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
6.5 Par6 Expressions
The effective type of an object for an access to its stored value is the declared type of the object, if any. 87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
87) Allocated objects have no declared type.
IIUC R_alloc
returns an offset into a malloc
ed block that is guaranteed to be double
aligned, and the size of the block after the offset is of the requested size (there is also allocation before the offset for R specific data). R_alloc
casts that pointer to (char *)
on return.
Section 6.2.5 Par 29
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. 48) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other.
All pointers to union types shall have the same representation and alignment requirements as each other.
Pointers to other types need not have the same representation or alignment requirements.48) The same representation and alignment requirements are meant to imply interchangeability asarguments to functions, return values from functions, and members of unions.
So the question is "are we allowed to recast the (char *)
to (const char **)
and write to it as (const char **)
". My reading of the above is that so long as pointers on the systems the code run in have alignment compatible with double
alignment, then its okay.
Are we violating "strict aliasing"? i.e.:
6.5 Par 7
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 88)
— a type compatible with the effective type of the object ...
88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
So, what should the compiler think the effective type of the object pointed to by res.target
(or res.current
) is? Presumably the declared type (const char **)
, or is this actually ambiguous? It feels to me that it isn't in this case only because there is no other 'lvalue' in scope that accesses the same object.
I'll admit I'm struggling mightily to extract sense from these sections of the standard.