3

While revisiting some codes I've written, I noticed that the build commands in the test scripts did not correctly invoke the scan-build command. The formation of a revision is ready, but I have some question with regard to the capability of scan-build and the Clang static analyzer.

Can the analyzer discover errors at link time? How to do that?

For example, within a single source file, it's easy to discover memory allocation errors (leak, double-free, free-after-use, etc.), but can it still discover such errors when it's done through interface functions implemented in another translation unit?

I've written 2 files for testing whether it can do that, but apparently it cannot.

/* memlib.c */
#include <stdlib.h>
void *foo_alloc(int len) { return malloc(len * 4); }
void foo_dealloc(void *foo) { return free(foo); }
/* mem-main.c */
void *foo_alloc(int len);
void foo_dealloc(void *foo);

int main()
{
    int *p;

    p = foo_alloc(2);
    p[1] = 32;
    p = foo_alloc(1);
    p[0] = 54;
    foo_dealloc(p);
    p[0] = 47;
    foo_dealloc(p);

    return 0;
}

The compilation command:

scan-build sh -c '$CC "$@"' foo -o mem-main mem-main.c memlib.c

I'm using the scan-build from PyPI, but I think that's pretty much irrelevant as it's just a program driver.

As a side note, I'm open to tool recommendations where link-time analysis can be performed.

DannyNiu
  • 1,313
  • 8
  • 27
  • You might be a bit more specific what link-time problem you want to discover? Do you mean calling `free` for same block twice and a memory leak for first allocation? – Gerhardh May 23 '22 at 07:48
  • I've added a paragraph. But actually, I'm interested in any and all link-time errors, provided that it can discover them. – DannyNiu May 23 '22 at 07:56
  • Commercial static analyzers typically have an option to include several .c files like a project. I haven't used the clang one but maybe it can do that too. Note that the root cause of the bugs in your case is the "creative" program design with function declarations in one .c file and definitions in another. Normal C programs aren't written like this, but use header files for the declarations. – Lundin May 23 '22 at 08:55
  • @Lundin I moved my declarations to a header, and it still shows no diagnosis. I'm using the `scan-build` from https://pypi.org/project/scan-build/ – DannyNiu May 23 '22 at 09:19
  • I don't know if Clang can find these errors. But I know that PVS-Studio can find them in the intermodular analysis mode. It will issue the following warnings: V774 The 'p' pointer was used after the memory was released. Check lines: 'memlib.c:5', 'mem-main.c:13', 'mem-main.c:14'. V586 The 'foo_dealloc' function is called twice for deallocation of the same memory space. mem-main.c(15), mem-main.c(13) – AndreyKarpov May 23 '22 at 09:30

1 Answers1

3

Clang has experimental support for analyzing across translation units. See the Clang documentation for Cross Translation Unit (CTU) Analysis. However, it's currently (2022-05-23) a fairly messy proposition, as explained in the linked document. A summary of the basic steps is:

  1. Use clang++ -emit-ast to create .ast files for each translation unit (TU).
  2. Use clang-extdef-mapping to make a list of definitions in each TU.
  3. Used sed (!) to make ad-hoc fixes to the definition list files, specifically, changing ".cpp" to ".cpp.ast" and changing file paths to be relative.
  4. Run the analysis like this:
$ clang++ --analyze \
    -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \
    -Xclang -analyzer-config -Xclang ctu-dir=. \
    -Xclang -analyzer-output=plist-multi-file \
    main.cpp

Does it work? The presentation Using the Clang Static Analyzer by Vince Bridgers at an LLVM meetup in 2020, slide 25, shows that cross-translation-unit analysis approximately doubled the number of findings across five code bases. Some findings are lost as well, but that will be a mix of lost false positives (good) and lost true positives (bad), and that presentation doesn't further elaborate. (My guess, though, is the majority are lost FPs.)

Regarding tool recommendations, one of the main ways that commercial static analysis tools differ from the open source tools is more accurate inter-procedural and cross-translation-unit analysis. If this is of particular interest, you may want to look into available commercial tools. (Disclosure: I formerly worked for a commercial static analysis vendor and have related ongoing financial interests.)

Scott McPeak
  • 8,803
  • 2
  • 40
  • 79