4

Wait, come back, I promise this isn't about uninitialized pointers!

The problem

I have written some unit tests using Criterion. The code under test doesn't matter; the problem crops up in the test itself. Here's a simplified version of the test:

#include <stdio.h>
#include <criterion/criterion.h>
#include <criterion/parameterized.h>

typedef struct {
    char *input;
} paramspec;

TestSuite(Example);

ParameterizedTestParameters(Example, test_example) {
    static paramspec params[] = {
        {"this is a test"},
    };

    size_t nb_params = sizeof (params) / sizeof (paramspec);
    return cr_make_param_array(paramspec, params, nb_params);
}

ParameterizedTest(paramspec *param, Example, test_example) {
    printf("input value is: %s\n", param->input);
}

When compiled in an Ubuntu 22.04 container, using either gcc-11 (11.4.0) or gcc-10 (10.5.0), running this test produces:

[====] Running 1 test from Example:
[RUN ] Example::test_example
[----] test_example.c:20: Unexpected signal caught below this line!
[FAIL] Example::test_example: CRASH!
[====] Synthesis: Tested: 1 | Passing: 0 | Failing: 1 | Crashing: 1

It doesn't say it in the output, but that's a SIGSEGV. If I attach to the test with gdb and print the value of *param, I see error: Cannot access memory at address ...:

Thread 1 "test_example" hit Breakpoint 1, Example_test_example_impl (param=0x7ffff7fa1330)
    at test_example.c:21
21          printf("input value is: %s\n", param->input);
(gdb) p *param
$1 = {input = 0x55abdb8e81ec <error: Cannot access memory at address 0x55abdb8e81ec>}

But!

The mystery thickens

If I build the code under Fedora 34 (which I selected because that includes gcc 11.3.1, which is a close match for 11.4.0), the code works just fine:

[====] Running 1 test from Example:
[RUN ] Example::test_example
input value is: this is a test
[PASS] Example::test_example: (0.00s)
[====] Synthesis: Tested: 1 | Passing: 1 | Failing: 0 | Crashing: 0

And that code runs fine not only in the Fedora environment in which it was built -- it also runs without error in the Ubuntu environment!

In both environments, gdb is able to see the string value:

(gdb) p *param
$1 = {input = 0x41aac9 "this is a test"}

The question

What aspect of the build environment is resulting in segfault? That's just a static string being accessed by code in the same file; there's no pointer allocation to go awry or anything like that.

On the Ubuntu side I've built this with gcc-{9,10,11} and the behavior is identical in all cases.

Building with sanitizers

Building the code with -fsanitize=undefined,address results in the following:

==16==ERROR: AddressSanitizer: SEGV on unknown address 0x5594e31d7400 (pc 0x7f1491d4e086 bp 0x7ffdfe1765b0 sp 0x7ffdfe175cf8 T0)
==16==The signal is caused by a READ memory access.
    #0 0x7f1491d4e086 in __sanitizer::internal_strlen(char const*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167
    #1 0x7f1491cdf2ed in printf_common ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors_format.inc:551
    #2 0x7f1491cdf6cc in __interceptor_vprintf ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1660
    #3 0x7f1491cdf7c6 in __interceptor_printf ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1718
    #4 0x555fd432c365 in Example_test_example_impl /src/test_example.c:21
    #5 0x7f1491c18298 in criterion_internal_test_main ../src/core/test.c:97
    #6 0x555fd432c2e7 in Example_test_example_jmp /src/test_example.c:20
    #7 0x7f1491c16849 in run_test_child ../src/core/runner_coroutine.c:230
    #8 0x7f1491c28a92 in bxfi_main ../subprojects/boxfort/src/sandbox.c:57
    #9 0x7f14913ced8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #10 0x7f14913cee3f in __libc_start_main_impl ../csu/libc-start.c:392
    #11 0x555fd432c1a4 in _start (/src/test_example+0x21a4)

AddressSanitizer can not provide additional info.

...but that's pretty much what gdb was telling us earlier. This error only crops up when building on Ubuntu; I don't see a similar error on the Fedora build.

A complete reproducer

If anyone is interested in taking a closer look, I've put together a complete reproducer here, that includes the test code, Makefile, Dockerfile, and a README.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • 1
    One of the things I can think of is that the code has undefined behavior. It'll be pretty hard to rely on consistent behavior from such code. If you show a [mre] it'll be answered quickly I think. – Ted Lyngmo Aug 11 '23 at 00:41
  • All I can guess is that the Criterion macros are generating bad code for the tests. – Barmar Aug 11 '23 at 00:45
  • 3
    *I promise this isn't about uninitialized pointers!* You really can't make that promise until you know what's causing the failure... – Andrew Henle Aug 11 '23 at 00:51
  • @TedLyngmo This is the most minimal example I could put together. If I could write plain C code that would reliably reproduce this problem I would probably have the solution! The GDB behavior in particular makes me doubt this is something funky in the macros, because that's just parroting back the struct that I created...and I can't figure out why with one build I can see the value, and in the other build I can't. – larsks Aug 11 '23 at 00:54
  • Did you try it with the sanitizers? They are crazy good nowadays. – Ted Lyngmo Aug 11 '23 at 01:03
  • 1
    That's a good thought; I'm trying that now. – larsks Aug 11 '23 at 01:05
  • I'd probably try to compare the assembly of the working vs. nonworking binary. – jpalecek Aug 11 '23 at 01:16
  • @TedLyngmo I've added the output from the sanitizer. I don't think it adds any new information. – larsks Aug 11 '23 at 01:17
  • @larsks perhaps not to us without the source code that made that happen - but it _should_ give your some hints. `/src/test_split.c:20` seems like a probable culprit. – Ted Lyngmo Aug 11 '23 at 01:20
  • IMHO it will be something like "the headers on your system don't match the shared library" (which is even more likely if the shared library is in `/usr/local`. Or a compiler bug, but that's unlikely. – jpalecek Aug 11 '23 at 01:26
  • @TedLyngmo `test_split.c:20` is visible in the code included in the question, and that's simply the line that attempts to access `param->input`, and we already know that's where the problem is cropping up. – larsks Aug 11 '23 at 01:27
  • I'm eager to propose dropping the test framework if that's what you get from it. – Ted Lyngmo Aug 11 '23 at 01:28
  • The unreadable address is quite puzzling. That suggests it points outside the program's allocated memory segment. Have you verified that the pointer value of `params` passed into `ParameterizedTest` holds the same address as the static `params` defined in `ParameterizedTestParameters`? Is it possible the alignment of `struct criterion_test_params` is different between your code and the compiled Criterion libs? And/or even possibly one of them having been compiled as C++ instead of C? – paddy Aug 11 '23 at 01:56
  • @paddy the pointer is the same in the both functions (verified by sticking a `fprintf(stderr, "pointer: %p", ...)` in both places. I'm going to look into building criterion from source rather than using the release package. All I wanted was a non-abandoned unit test package for c with automatic test registration and minimal boilerplate :). – larsks Aug 11 '23 at 02:05
  • 1
    If you get to the bottom of this, please do update! I'll be very curious to know what the issue was. – paddy Aug 11 '23 at 03:15
  • When you say "the release package" for Ubuntu 22.04, do you mean the version actually included in Ubuntu as package libcriterion-dev? At any rate, that's what I would recommend, not the generic binary download available from the project's GitHub. – John Bollinger Aug 11 '23 at 03:46
  • @JohnBollinger I have used the `libcriterion-dev` package, the release binary [from github](https://github.com/Snaipe/Criterion/releases/tag/v2.4.2), and I have built and installed criterion from source. In all cases I experience the same issue. – larsks Aug 11 '23 at 03:55
  • Ok. You chose Criterion because it is an active project, and I'd make use of that now. This looks like an issue that Criterion's devs would be best equipped to handle. I suggesting reporting the issue to them. – John Bollinger Aug 11 '23 at 04:10

2 Answers2

1

OK, after having the testcase on hand, this seems easy.

First, I am testing on Debian and noticed that it does crash there.

Second, quite astonishingly it didn't crash when i ran the test binary under GDB. But I noticed this line:

[Detaching after fork from child process 3071548]

So there are multiple processes there. And with strace I found it indeed fork()s + exec()s a child and it's the child which crashes.

So I added a debug printf (don't know why you hadn't done that yourself, from your comment I got the impression that you checked this):

const char* const test_msg = "this is a test";
ParameterizedTestParameters(Example, test_example) {
    static paramspec params[] = {
        {test_msg},
    };

    printf("param is %p, %llx, %p\n", params, *(intptr_t*)params, test_msg);
    size_t nb_params = sizeof (params) / sizeof (paramspec);
    return cr_make_param_array(paramspec, params, nb_params);
}

ParameterizedTest(paramspec *param, Example, test_example) {
    printf("param is %p, %llx, %p\n", param, *(intptr_t*)param, test_msg);

and the penny's beginning to drop:

param is 0x557fdad5e040, 557fdad5c004, 0x557fdad5c004
param is 0x7f764ba45330, 557fdad5c004, 0x5632b4277004

See how the pointer in the param is the same, but the actual address of the string differs. This is caused by ASLR. So the child has static data on a different address than the parent, but the parent passes (through shared memory, it seems) the pointer, which isn't of much use to the child, verbatim.

Your best way to fix it is to use, as the Criterion documentation suggests, a dynamically allocated parameter and the function cr_malloc.

static paramspec params[] = {
    {0},
};
params[0].input = cr_malloc(strlen(test_msg)+1);
strcpy(params[0].input, test_msg);
jpalecek
  • 47,058
  • 7
  • 102
  • 144
  • "don't know why you hadn't done that yourself": I *had* done that myself; I was printing out the value of `params[0].input` in `ParameterizedTestParameters` and `param->input` in `ParameterizedTest`, and the values were *identical*, which is why I was confused. – larsks Aug 11 '23 at 12:18
  • I think this is a good description of the underlying problem, but I'm still puzzled by the differing behavior when compiling under Ubuntu vs under Fedora. It doesn't look like Fedora is disabling `pie` support in gcc or anything like that. – larsks Aug 11 '23 at 12:19
  • I had seen the documentation on `cr_malloc`, but it explicitly says that applies to *dynamic* memory allocation, which we're not using here, which is why I hadn't gone down that route. It does seem to work around the problem. – larsks Aug 11 '23 at 12:20
  • Sounds like a bug in Criterion - if fails under ASLR, which isn't exactly an uncommon feature... – Andrew Henle Aug 11 '23 at 14:19
1

@jpalecek has identified the problem (thanks!). I wanted to add a quick note in case other folks run into this issue.

Using cr_malloc in place of static allocation does resolve the problem, but it substantially complicates the initialization of data structures by requiring multiple extra steps (cr_malloc some memory and then copy data into it). For a test like this, we could do something like the following to allow us to initialize data structure using static strings and then bulk-convert everything:

ParameterizedTestParameters(Example, test_example) {
    paramspec params[] = {
        {"this is a test"},
    };

    size_t nb_params = sizeof (params) / sizeof (paramspec);

    // Replace static strings with dynamic allocations
    // using cr_malloc()
    for (int i=0; i<nb_params; i++) {
        char *s = cr_malloc(strlen(params[i].input) + 1);
        strcpy(s, params[i].input);
        params[i].input = s;
    }

    return cr_make_param_array(paramspec, params, nb_params);
}

Since the problem is related to ASLR, we can also link with -no-pie to resolve the issue; building the code like this:

gcc -no-pie -o test_example test_example.c -lcriterion

Runs reliably in both environments.


Lastly, it looks like the reason we see the difference in behavior is that Ubuntu enables -pie by default, while Fedora does not.

larsks
  • 277,717
  • 41
  • 399
  • 399