9

I would be interested to know if its possible to explicitly taint a variable in C, as being uninitialized.

Pseudo code...

{
    int *array;
    array = some_alloc();
    b = array[0];
    some_free(array);
    TAINT_MACRO(array);

    /* the compiler should raise an uninitialized warning here */
    b = array[0];
}

Here is one example of one way to taint a variable, but GCC is raising a warning when 'a' is assigned the uninitialized var, rather then the second use of 'a'.

{
    int a = 10;
    printf("first %d\n", a);
    do {
        int b;
        a = b;
    } while(0);
    printf("second %d\n", a);
}

The only solution I could come up with is to explicitly shadow the variable with an uninitialized one, (voids are added so there are no unused warnings).

#define TAINT_MACRO_BEGIN(array) (void)(array); { void **array; (void)array;
#define TAINT_MACRO_END(array) } (void)(array);
{
    int *array;
    array = some_alloc();
    b = array[0];
    some_free(array);
    TAINT_MACRO_BEGIN(array);

    /* the compiler should raise an uninitialized warning here */
    b = array[0];
    TAINT_MACRO_END(array);
}

This method adds too much overhead to include in existing code (adds a lot of noise and annoying to maintain), so I was wondering if there was some other way to tell the compiler a variable is uninitialized.

I know there are static checkers and I do use these, but Im looking for something the that can give a warning at compile time and without false positives which I believe is possible in this case and can avoid a certain class of bugs.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
ideasman42
  • 42,413
  • 44
  • 197
  • 320
  • What if a pointer was freed in one translation unit, then used in another? The compiler can't catch that. – Collin May 25 '13 at 03:04
  • `some_index`is not declared in your first example. I'd expect all compiler to complain about that, rather than initialization. – chux - Reinstate Monica May 25 '13 at 03:06
  • Not sure why you do not like the warning in your 2nd example at `a = b;` A compiler that also complained about subsequent bad uses (`printf("second %d\n", a);`) would be verbose. The first warning will suffice for most debugging. – chux - Reinstate Monica May 25 '13 at 03:12
  • @chux renamed some_index to 0 to avoid confusion, The intent of assigning `a = b` is to set 'a' to be an uninitialized variable. While I agree how the compiler works is not _wrong_, I'm looking for some way to taint a variable to make use of a compiler warning, so I accept trick (compiler attribute/builtin or so) may be required, this can be wrapped into a macro for example. – ideasman42 May 25 '13 at 03:25
  • @Collin, your right - this won't catch use of freed memory across functions, but even without this ability I think it would be useful to catch simple 'use after freeing' bugs at compile time. – ideasman42 May 25 '13 at 03:28
  • Are you asking how to implement a memory debugging tool? Have you looked at electric fence? – jxh May 25 '13 at 03:50
  • @user315052, no - tools like valgrind, duma, eletric fence... etc are runtime, Im looking for the ability to tell the compiler a variable is uninitialized so it will warn on usage until its re-assigned. – ideasman42 May 25 '13 at 03:52
  • 1
    I see, you are trying to implement a poor man's static analysis tool. Why not just use a real static analysis tool? Since you are willing to decorate your code, [splint](http://www.splint.org/) may be a good fit for you. – jxh May 25 '13 at 03:57
  • @user315052, I was anticipating that this would come up, and yes - I use real static analysis tools, including the one you mentioned - the problem is they have so many false positives that its impractical to quiet them all. The case Im talking about is simple, limited, but could be made to have zero false positives - with a bonus that it happens at compile time so issues don't slip through until you next run static analysis. – ideasman42 May 25 '13 at 04:05
  • 1
    @user315052: Do all static analysis too provide annotation features that allow one to taint variables? If fact, I would be nice to know which tools have that feature. – AnT stands with Russia May 25 '13 at 06:11
  • @AndreyT: All static analysis tools that I am aware of can track when code is accessing freed memory. – jxh May 25 '13 at 06:31
  • @AndreyT, they do but they make assumptions that are often incorrect, hence - false positives. – ideasman42 May 25 '13 at 06:37
  • @AndreyT: I do agree there are many false positives. They all provide suppression features, but involves adding knowledge base rules to their system, or adding special comments to the code to quiet those cases, or even modifying the code in some way that the tool understands better. – jxh May 25 '13 at 07:18
  • Please correct me if I'm wrong, but isn't `int b; a = b` invalid (or veeeery bad) because of a read without a write before? – Uroc327 May 25 '13 at 15:30
  • @Uroc327, its intentionally invalidating the value, so that a compiler might notice this and track, 'a' as being uninitialized. As it turns out gcc and clang don't do that, but some similar trick could possibly be made to work. – ideasman42 May 25 '13 at 17:28
  • @ideasman42: This is not about just "freed memory". This is about a much more general feature, that can be easily shown to much greater usability than anything static tools can detect automatically. If I have an invariant in my program that ties two (or more) entities to each other, invalidation one entity should taint all others. A static tool won't be able to figure that out for me. – AnT stands with Russia May 25 '13 at 22:50
  • @AndreyT, right - you may setup some array on the stack (or even a single var for that matter) and because of your codes own internal logic the contents becomes invalid (but initialized as far as compiler is concerned), so the ability to taint shows intent, the possibility to have some compiler warning would be handy too to catch a situation the original developer didn't intend. – ideasman42 May 26 '13 at 08:11
  • I can sort of see how this would be possible without shadowing for a local variable, but I can't see a solution for function parameters without shadowing. – jxh Feb 08 '17 at 01:29
  • AFAICS it would need to be supported by the compiler, or - the compiler would need to interpret an action *(such as explicitly assign an uninitialized variable in a unambiguous way)* as re-tainting the existing variable as uninitialized again. – ideasman42 Feb 12 '17 at 04:10

3 Answers3

4

I sent an answer on the GCC list, but since I use SO first myself...

In modern C and C++, I would expect programmers to use limited variable scope to control this kind of exposure.

For example, I think you want something like this (note that the attribute I'm using doesn't actually exist, I'm just trying to paraphrase your request).

int x = 1; // initialized 
int y;     // uninitialized 

x = y;     // use of uninitialized value 'y' 

y = 2;     // no longer uninitialized 
x = y;     // fine 

y = ((__attr__ uninitialized))0; // tell gcc it's uninitialized again 

x = y;    // warn here please. 

If so, I would use additional scopes in C99 (or later) or C++ (pretty sure it's had "declare at point of use" since at least ARM in 1993...):

int x = 1; // initialized 

{ 
    int y; // uninitialized 
    x = y; // warn here 
    y = 2; // ok, now it's initialized 
    x = y; // fine, no warning 
} 

{ 
    int y; // uninitialized again! 
    x = y; // warns here 
} 

The extra scopes are a bit off-putting, but I'm very used to them in C++ (from heavy use of RAII techniques.)

Since there is an answer for this in mainstream languages, I don't think it's worth adding to the compiler.

Looking at your example, you're concerned with an array. That should work just as well with the extra scopes, and there should be no extra runtime cost, since the entire stack frame is allocated on function entry (SFAIK, at least).

AnthonyFoiani
  • 504
  • 5
  • 7
  • I sometimes do this, when chunks of code locally can be split into blocks and each has its own vars, it sometimes makes sense. However with large functions with a lot of nested blocks I really would prefer not to cause more indents by adding blocks just because of the chance a variable is used when it shouldn't be. Basically- indenting large blocks of existing code is quite disruptive (messes with commit history) and in most cases Id say the trade-off isn't worthwhile. A single line to taint a variable on the other hand isnt causing too much noise so IMHO its acceptable. – ideasman42 May 27 '13 at 16:53
  • I understand your objections, but I sincerely believe you'll be better off in the long run if you restructure your code to use more scopes -- whether by adding additional scopes to existing functions, or by splitting your functions up into smaller ones. If you're running into length limitations, and running out of sideways room for indentation, those are both hints that you need to factor that function out into smaller ones. Finally, you're essentially asking the compiler to treat the variable "like new"; actually making it "new" would, IMHO, be less confusing. Good luck! – AnthonyFoiani May 27 '13 at 23:41
  • This is the good-practice answer. In reality, the problem in the question can be best avoided by not reusing variables within a single scope, which is a) a code smell, and b) a *really bad* smell if the variables can become invalid halfway through their lifespan. If this can happen, what you *actually have* is two variables, so why not make it explicit? – Alex Celeste Jan 03 '16 at 12:59
2

Based on an answer to a different question, you can use setjmp and longjmp to make a changed local variable have an indeterminate value.

#define TAINT(x)                             \
        do {                                 \
            static jmp_buf jb;               \
            if (setjmp(jb) == 0) {           \
                memset(&x, '\0', sizeof(x)); \
                longjmp(jb, 1);              \
            }                                \
        } while (0)

If x is a local variable, it's value will be indeterminate in the lines of code after TAINT is applied to it. This is because of C.11 §7.13.2 ¶3 (emphasis mine):

All accessible objects have values, and all other components of the abstract machine have state, as of the time the longjmp function was called, except that the values of objects of automatic storage duration that are local to the function containing the invocation of the corresponding setjmp macro that do not have volatile-qualified type and have been changed between the setjmp invocation and longjmp call are indeterminate.

Note that no diagnostic is required for using a variable that is so tainted. However, compiler writers are aggressively detecting undefined behavior to enhance optimization, and so I would be surprised if this remains undiagnosed forever.

Community
  • 1
  • 1
jxh
  • 69,070
  • 8
  • 110
  • 193
1

I would go the other way around, and wrap taint macros around the allocation and free functions. This is what I have in mind:

#ifdef O_TAINT
volatile int taint_me;
#define TAINT(x, m) \
    if (taint_me) { goto taint_end_##x; } else {} x = m
#define free(x) free(x); taint_end_##x: (void)0
#else
#define TAINT(x, m) x = m
#endif

So, your example would look like this:

int *array;
int b;

TAINT(array, malloc(sizeof(int)));
b = array[0];
printf("%d\n", b);
free(array);

/* the compiler should raise an uninitialized warning here */
b = array[0];
printf("%d\n", b);

This isn't perfect. There can only be one call to free() per tainted variable, because the goto label is tied to the variable name. If the jump skips over other initializations, you may get other false positives. It doesn't work if the allocation occurs in one function, and the memory freed in a different function.

But, it provides the behavior that you asked for your example. When compiled normally, no warnings would appear. If compiled with -DO_TAINT, a warning will appear at the second assignment to b.


I did work out a fairly general solution, but it involves bracketing the whole function with begin/end macros, and relies on the GCC extension typeof operator. The solution ends up looking like this:

void foo (int *array, char *buf)
{
    TAINT_BEGIN2(array, buf);
    int b;

    puts(buf);
    b = array[0];
    printf("%d\n", b);

    free(array);
    free(buf);

    /* the compiler should raise an uninitialized warning here */
    puts(buf);
    b = array[0];
    printf("%d\n", b);

    TAINT_END;
}

Here, TAINT_BEGIN2 is used to declare the two function parameters that will get the taint treatment. Unfortunately, the macros are kind of a mess, but easy to extend:

#ifdef O_TAINT
volatile int taint_me;
#define TAINT(x, m) \
    if (taint_me) { goto taint_end_##x; } else {} x = m
#define TAINT1(x) \
    if (taint_me) { goto taint_end_##x; } else {} x = x##_taint
#define TAINT_BEGIN(v1) \
    typeof(v1) v1##_taint = v1; do { \
    typeof(v1##_taint) v1; TAINT1(v1)
#define TAINT_BEGIN2(v1, ...) \
    typeof(v1) v1##_taint = v1; TAINT_BEGIN(__VA_ARGS__); \
    typeof(v1##_taint) v1; TAINT1(v1)
#define TAINT_BEGIN3(v1, ...) \
    typeof(v1) v1##_taint = v1; TAINT_BEGIN2(__VA_ARGS__); \
    typeof(v1##_taint) v1; TAINT1(v1)
#define TAINT_END } while(0)
#define free(x) free(x); taint_end_##x: (void)0
#else
#define TAINT_BEGIN(x) (void)0
#define TAINT_BEGIN2(...) (void)0
#define TAINT_BEGIN3(...) (void)0
#define TAINT_END (void)0
#define TAINT1(x) (void)0
#define TAINT(x, m) x = m
#endif
jxh
  • 69,070
  • 8
  • 110
  • 193
  • regarding your solution, it looks like it actually uses 'taint_me' before its initialized?, or would you define a shadow for it elsewhere? – ideasman42 May 25 '13 at 04:47
  • @ideasman42: `taint_me` is a global, so it get initialized to 0. But since it is declared volatile, the compiler doesn't know whether it is still 0 or not during compilation. – jxh May 25 '13 at 04:49
  • Regarding using a begin/end macros, Im sure it can be made to work, the problem is this is too disruptive and not something I could commit to my projects codebase - would change 100s (maybe 1000s) of lines and IMHO doesn't read very nice. I think it would be tricky to allow re-assigning within the block too (maybe possible with typeof() and macro magic I guess). If it were possible to do in a single block, it could be wrapped into a macro that wraps a function call, (free for example), which can then be used without having to go over all source files and editing them. – ideasman42 May 25 '13 at 04:51
  • @ideasman42: I can redefine `free()` to do what `TAINT_END` does if that would make the solution more palatable. You should realize that your approach is kind of hackish, so solutions are likely to be hackish as well. – jxh May 25 '13 at 04:56
  • 1
    there so many places in the code where malloc() and free() aren't in the same function that I dont think its worth attempting to make wrap malloc() and free() into macros which start/end a block of code. Since the compiler is already aware of a variables initialized state, I was hoping there was some way to control it without having to make larger changes to flow control or limiting a variables scope. Weather the solution is hackish or not, if it can be made to work reliably I think it would be useful. – ideasman42 May 25 '13 at 05:02
  • You have valid objections to the proposed solution. I have indicated its limitations in the answer. I believe you will have to make a feature request to your compiler vendor to get something better (than either your solution or mine). – jxh May 25 '13 at 05:45
  • @ideasman42 *Since the compiler is already aware of a variables initialized state,...* You are making a false assumption. In general it is not possible for the compiler to know this. Data flow analysis is a) optional and b) impossible for many real world cases where the state of initialization depends on input at *runtime*. – Jens May 25 '13 at 11:25
  • @Jens, in some cases GCC/Clang can know for sure, and will warn if uninitialized, there is the case where its not sure, hence `-Wmaybe-uninitialized`. Being able to reset the initialized state would have the same limits that exist now. But I see your point. – ideasman42 May 25 '13 at 12:22