1

I have code in Code Reveiw that "works" as expected, yet may have UB .

Code has an array of same-sized char arrays called GP2_format[]. To detect if the pointer format has the same value as the address of one of the elements GP2_format[][0], the below code simple tested if the pointer was >= the smallest element and <= the greatest. As the elements are size 1, no further checking needed.

const char GP2_format[GP2_format_N + 1][1];
const char *format = ...;

if (format >= GP2_format[0] && format <= GP2_format[GP2_format_N]) Inside()
else Outside();

C11 §6.5.8/5 Relational operators < > <= >= appears to define this as the dreaded Undefined Behavior when comparing a pointer from outside the array.

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, ... of the same array object, they compare equal. ...(same object OK) .... (same union OK) .... (same array OK) ... In all other cases, the behavior is undefined.


Q1 Is code's pointer compare in GP2_get_type() UB?
Q2 If so, what is a well defined alternate, search O(1), to the questionable GP2_get_type()?

Slower solutions
Code could sequentially test format against each GP2_format[] or convert the values to intptr_t, sort one time and do a O(ln2(n)) search.

Similar
...if a pointer is part of a set, but this "set" is not random, it is an array.
intptr_t approach - maybe UB.

#include <stdio.h>

typedef enum {
  GP2_set_precision,
  GP2_set_w,
  GP2_setios_flags_,
  GP2_string_,
  GP2_unknown_,
  GP2_format_N
} GP2_type;

const char GP2_format[GP2_format_N + 1][1];

static int GP2_get_type(const char *format) {
  // candidate UB with pointer compare
  if (format >= GP2_format[0] && format <= GP2_format[GP2_format_N]) {
    return (int) (format - GP2_format[0]);
  }
  return GP2_format_N;
}

int main(void) {
  printf("%d\n", GP2_get_type(GP2_format[1]));
  printf("%d\n", GP2_get_type("Hello World"));  // potential UB
  return 0;
}

Output (as expected, yet potentially UB)

1
5
Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Sidenote; Can I just ask why do you think it is potentially UB? I think it is clear that it is UB. – 2501 Feb 02 '16 at 21:14
  • You have to ensure the pointer is inside the array anyway. Two pointers can only be compared if they point to the same object (or **exactly one** past the last entry in an array). – too honest for this site Feb 02 '16 at 21:16
  • @Olaf *Any* pointer may be compared with == or !=. – 2501 Feb 02 '16 at 21:18
  • Comparing two pointers to check, if they point to the same object, may not yield expected results on all architectures, for example if there are multiple segments with identical addresses. Not all address spaces are flat as in your pc (i.e. the address of a function may be the same as the address of an integer, one in instruction and one in data memory). Having said that, I think the pointer comparison is probably safe for you (depending a bit on how you get the address "format" from and how portable your code must be), even if the standard doesn't guarantee it. – Ctx Feb 02 '16 at 21:25
  • 1
    @2501: OP clearly asks for ordered compares (e.g. `<=`). For those operators the comparison is _undefined behaviour_. – too honest for this site Feb 02 '16 at 21:28
  • I agree with @2501 this is obviously UB. The standard is very clear here. However, you can use equality comparators to check if it matches the address of an element. (sidenote: For some OS kernel implementation, I treat the whole system memory a a single array, so the comparison becomes valid again. That might invoke othjer UB, though). Welcome in the minefield! – too honest for this site Feb 02 '16 at 21:30
  • @Olaf I was responding to you. Your statement needed clarification and I provided it. – 2501 Feb 02 '16 at 21:37
  • @Ctx the standard says it has to work for two pointers pointing to the same object. The compiler has to provide this guarantee. AFAIK this is usually done on a segmented architecture by requiring any object to fit in one segment. – M.M Feb 02 '16 at 22:27
  • @Olaf The use of `<=` with pointers, etc. is not required for a solution, it is just the method originally used. – chux - Reinstate Monica Feb 02 '16 at 23:33
  • @2501 "why do you think it is potentially UB?" --> being hopeful. The C spec is quite long and a good understanding of a portion is sometimes undone by a less familiar portion somewhere else. – chux - Reinstate Monica Feb 02 '16 at 23:42
  • The answer to Q1 is yes, it's UB. Q2 should be posted as a separate question, with the code posted on code review, so that the question is self-contained and does not rely on links. As is, your question is an XY problem. – user3386109 Feb 02 '16 at 23:45
  • @user3386109 As stated in the post, the code is posted on code review - that is the link. The main question here is Q2 which echoes the title. – chux - Reinstate Monica Feb 02 '16 at 23:58
  • Q2 is problem Y. Problem X is how to write a generic-based printf that works. I only know that because I followed the link to code review. If you want the answer to X, you need to actually ask X here on SO. – user3386109 Feb 03 '16 at 00:15
  • 1
    I have programmed C on many different microcontrollers, microcomputers, mini computers, desktop/laptop computers, mainframes. In all that time (ever since C became available) the relational comparison of one pointer value to another pointer value has never failed. My suggestion: If the compiler is not raising a warning or error message, then do not worry about it. (of course, always compile with all warnings enabled) – user3629249 Feb 03 '16 at 20:07

2 Answers2

2

If you want to comply with the C Standard then your options are:

  • Perform individual == or != tests against each pointer in the target range
    • You could use a hash table or search tree or something to speed this up, if it is a very large set
  • Redesign your code to not require this check.

A "probably works" method would be to cast all of the values to uintptr_t and then do relational comparison. If the system has a memory model with absolute ordering then it should define uintptr_t and preserve that ordering; and if it doesn't have such a model then the relational compare idea never would have worked anyway.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Hmmm, Doing an individual test on each element may even work very nice on a "reasonable" platform with code like `if (format == a[0] || format == a[1] || format == a[2] || ... || format == a[n])` and a good compiler. Even if the system lacks "memory model with absolute ordering", it _certainly_ has some ordering that the compiler can deduce and generate optimize code. I shall try it. – chux - Reinstate Monica Feb 02 '16 at 23:46
2

This is not an answer to the stated question, but an answer to the underlying problem.

Unless I am mistaken, the entire problem can be avoided by making GP_format a string. This way the problem simplifies to checking whether a pointer points to within a known string, and that is not UB. (If it is, then using strchr() to find a character and compute its index in the string would be UB, which would be completely silly. That would be a serious bug in the standard, in my opinion. Then again, I'm not a language lawyer, just a programmer that tries to write robust, portable C. Fortunately, the standard states it's written to help people like me, and not compiler writers who want to avoid doing hard work by generating garbage whenever a technicality in the standard lets them.)

Here is a full example of the approach I had in mind. This also compiles with clang-3.5, since the newest GCC I have on the machine I'm currently using is version 4.8.4, which has no _Generic() support. If you use a different version of clang, or gcc, change the first line in the Makefile accordingly, or run e.g. make CC=gcc.

First, Makefile:

CC  := clang-3.5
CFLAGS  := -Wall -Wextra -std=c11 -O2
LD      := $(CC)
LDFLAGS :=
PROGS   := example

.PHONY: all clean

all: clean $(PROGS)

clean:
    rm -f *.o $(PROGS)

%.o: %.c
    $(CC) $(CFLAGS) -c $^

example: out.o main.o
    $(LD) $^ $(LDFLAGS) -o $@

Next, out.h:

#ifndef   OUT_H
#define   OUT_H 1
#include <stdio.h>

typedef enum {
    out_char,
    out_int,
    out_double,
    out_FILE,
    out_set_fixed,
    out_set_width,
    out_set_decimals,
    out_count
} out_type;

extern const char out_formats[out_count + 1];

extern int outf(FILE *, ...);

#define out(x...) outf(stdout, x)
#define err(x...) outf(stderr, x)

#define OUT(x) _Generic( (x),           \
    FILE *: out_formats + out_FILE,     \
    double: out_formats + out_double,   \
    int:    out_formats + out_int,      \
    char:   out_formats + out_char      ), (x)

#define OUT_END         ((const char *)0)
#define OUT_EOL         "\n", ((const char *)0)
#define OUT_fixed(x)    (out_formats + out_set_fixed), ((int)(x))
#define OUT_width(x)    (out_formats + out_set_width), ((int)(x))
#define OUT_decimals(x) (out_formats + out_set_decimals), ((int)(x))

#endif /* OUT_H */

Note that the above OUT() macro expands to two subexpressions separated by a comma. The first subexpression uses _Generic() to emit a pointer within out_formats based on the type of the macro argument. The second subexpression is the macro argument itself.

Having the first argument to the outf() function be a fixed one (the initial stream to use) simplifies the function implementation quite a bit.

Next, out.c:

#include <stdlib.h>
#include <stdarg.h>
#include <stdio.h>
#include <errno.h>
#include "out.h"

/* out_formats is a string consisting of ASCII NULs,
 * i.e. an array of zero chars.
 * We only check if a char pointer points to within out_formats,
 * if it points to a zero char; otherwise, it's just a normal
 * string we print as-is.
*/
const char out_formats[out_count + 1] = { 0 };

int outf(FILE *out, ...)
{
    va_list args;

    int     fixed = 0;
    int     width = -1;
    int     decimals = -1;

    if (!out)
        return EINVAL;

    va_start(args, out);

    while (1) {
        const char *const format = va_arg(args, const char *);

        if (!format) {
            va_end(args);
            return 0;
        }

        if (*format) {
            if (fputs(format, out) == EOF) {
                va_end(args);
                return 0;
            }
        } else
        if (format >= out_formats && format < out_formats + sizeof out_formats) {
            switch ((out_type)(format - out_formats)) {

            case out_char:
                if (fprintf(out, "%c", va_arg(args, int)) < 0) {
                    va_end(args);
                    return EIO;
                }
                break;

            case out_int:
                if (fprintf(out, "%*d", width, (int)va_arg(args, int)) < 0) {
                    va_end(args);
                    return EIO;
                }
                break;

            case out_double:
                if (fprintf(out, fixed ? "%*.*f" : "%*.*e", width, decimals, (float)va_arg(args, double)) < 0) {
                    va_end(args);
                    return EIO;
                }
                break;

            case out_FILE:
                out = va_arg(args, FILE *);
                if (!out) {
                    va_end(args);
                    return EINVAL;
                }
                break;

            case out_set_fixed:
                fixed = !!va_arg(args, int);
                break;

            case out_set_width:
                width = va_arg(args, int);
                break;

            case out_set_decimals:
                decimals = va_arg(args, int);
                break;

            case out_count:
                break;
            }
        }
    }
}

Note that the above lacks even OUT("string literal") support; it's quite minimal implementation.

Finally, the main.c to show an example of using the above:

#include <stdlib.h>
#include "out.h"

int main(void)
{
    double  q = 1.0e6 / 7.0;
    int     x;

    out("Hello, world!\n", OUT_END);

    out("One seventh of a million is ", OUT_decimals(3), OUT(q), " = ", OUT_fixed(1), OUT(q), ".", OUT_EOL);

    for (x = 1; x <= 9; x++)
        out(OUT(stderr), OUT(x), " ", OUT_width(2), OUT(x*x), OUT_EOL);

    return EXIT_SUCCESS;
}

In a comment, chux pointed out that we can get rid of the pointer inequality comparisons, if we fill the out_formats array; then (assuming, just for paranoia's sake, we skip the zero index), we can use (*format > 0 && *format < out_type_max && format == out_formats + *format) for the check. This seems to work just fine.

I also applied Pascal Cuoq's answer on how to make string literals decay into char * for _Generic(), so this does support out(OUT("literal")). Here is the modified out.h:

#ifndef   OUT_H
#define   OUT_H 1
#include <stdio.h>

typedef enum {
    out_string = 1,
    out_int,
    out_double,
    out_set_FILE,
    out_set_fixed,
    out_set_width,
    out_set_decimals,
    out_type_max
} out_type;

extern const char out_formats[out_type_max + 1];

extern int outf(FILE *, ...);

#define out(x...) outf(stdout, x)
#define err(x...) outf(stderr, x)

#define OUT(x) _Generic( (0,x),         \
    FILE *: out_formats + out_set_FILE, \
    double: out_formats + out_double,   \
    int:    out_formats + out_int,      \
    char *: out_formats + out_string    ), (x)

#define OUT_END         ((const char *)0)
#define OUT_EOL         "\n", ((const char *)0)
#define OUT_fixed(x)    (out_formats + out_set_fixed), ((int)(x))
#define OUT_width(x)    (out_formats + out_set_width), ((int)(x))
#define OUT_decimals(x) (out_formats + out_set_decimals), ((int)(x))

#endif /* OUT_H */

Here is the correspondingly modified implementation, out.c:

#include <stdlib.h>
#include <stdarg.h>
#include <stdio.h>
#include <errno.h>
#include "out.h"

const char out_formats[out_type_max + 1] = {
    [ out_string ] = out_string,
    [ out_int ] = out_int,
    [ out_double ] = out_double,
    [ out_set_FILE ] = out_set_FILE,
    [ out_set_fixed ] = out_set_fixed,
    [ out_set_width ] = out_set_width,
    [ out_set_decimals ] = out_set_decimals,
};

int outf(FILE *stream, ...)
{
    va_list args;

    /* State (also, stream is included in state) */
    int     fixed = 0;
    int     width = -1;
    int     decimals = -1;

    va_start(args, stream);

    while (1) {
        const char *const format = va_arg(args, const char *);

        if (!format) {
            va_end(args);
            return 0;
        }

        if (*format > 0 && *format < out_type_max && format == out_formats + (size_t)(*format)) {
            switch ((out_type)(*format)) {

            case out_string:
                {
                    const char *s = va_arg(args, char *);
                    if (s && *s) {
                        if (!stream) {
                            va_end(args);
                            return EINVAL;
                        }
                        if (fputs(s, stream) == EOF) {
                            va_end(args);
                            return EINVAL;
                        }
                    }
                }
                break;

            case out_int:
                if (!stream) {
                    va_end(args);
                    return EINVAL;
                }
                if (fprintf(stream, "%*d", width, (int)va_arg(args, int)) < 0) {
                    va_end(args);
                    return EIO;
                }
                break;

            case out_double:
                if (!stream) {
                    va_end(args);
                    return EINVAL;
                }
                if (fprintf(stream, fixed ? "%*.*f" : "%*.*e", width, decimals, va_arg(args, double)) < 0) {
                    va_end(args);
                    return EIO;
                }
                break;

            case out_set_FILE:
                stream = va_arg(args, FILE *);
                if (!stream) {
                    va_end(args);
                    return EINVAL;
                }
                break;

            case out_set_fixed:
                fixed = !!va_arg(args, int);
                break;

            case out_set_width:
                width = va_arg(args, int);
                break;

            case out_set_decimals:
                decimals = va_arg(args, int);
                break;

            case out_type_max:
                /* This is a bug. */
                break;
            }
        } else
        if (*format) {
            if (!stream) {
                va_end(args);
                return EINVAL;
            }
            if (fputs(format, stream) == EOF) {
                va_end(args);
                return EIO;
            }
        }
    }
}

If you find a bug or have a suggestion, please let me know in the comments. I don't actually need such code for anything, but I do find the approach very interesting.

Community
  • 1
  • 1
Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • Thanks for the interests. I see making `out_formats[]` a 1 dimensional array may simplify, but I think the problem still exist, although much rarer. `out("",...` passes `""` as `format` to `if (format >= out_formats && ...`and that is the same UB concern as the post as specified in C11 §6.5.8/5. The "using `strchr()`" idea is unclear to me as that deals with pointers to the same array. +1 for the simplification - almost there. – chux - Reinstate Monica Feb 03 '16 at 08:58
  • OTOH this is the germ of a modified idea: with `out_formats[out_count + 1] = { 0,1,2,3,… };` then the test becomes `if (*format < out_count && format == out_count[*format]) Inside()`. Do you think that modification to your idea would work? – chux - Reinstate Monica Feb 03 '16 at 09:00
  • @chux: Yep, that modification works just fine (although I use much more strict checking); see my modified answer. I kinda like this approach. – Nominal Animal Feb 03 '16 at 13:34
  • @chux: ping. Using `(*format > 0 && *format < out_type_max && format == out_formats + (size_t)(*format))` the modified approach works, with no UB involved. (You might wish to write it as `(*format > 0 && *format < out_type_max && format == &(out_formats[(size_t)(*format)]))` to make it clear and explicit for others reading the code, but AFAIK the two expressions are equivalent.) – Nominal Animal Feb 07 '16 at 10:24
  • ping. Your worthy post has much useful info on solving my ultimate issue. Yet M.M was closer to this post. – chux - Reinstate Monica Feb 07 '16 at 19:22
  • @chux: No problem; I just wanted to make sure you noticed that with the modification you suggested, there is no UB, that's all. As to your ultimate issue, I'd look into one function call per element instead, possibly via preprocessor magic that expands a single macro with multiple arguments to separate macros. It gets really hairy, but the little testing I did, indicates it might be more fruitful than your current approach; might even allow for fast dedicated I/O routines. (Assuming the macro magic could made to be working. I do have an idea..) – Nominal Animal Feb 07 '16 at 20:13
  • Look forward to chatting sometime about these ideas. Not sure when. – chux - Reinstate Monica Feb 07 '16 at 20:21
  • @chux: That's fine. I do have a public e-mail address at my [home page](http://www.nominal-animal.net/). It turns out Jonathan Leffler [posted an answer](http://stackoverflow.com/a/2308651/1475978) that can be used to find the number of parameters to a macro. We can turn `OUT()`⇒``, `OUT(arg)`⇒`OUTfn(arg)`, `OUT(arg1, arg2)`⇒`OUTfn(arg1), OUTfn(arg2)`, and so on, up to 62 arguments (if no arguments is handled properly). Apply `_Generic()` to turn each `OUTfn()` into a type-specific function call. For settings, dedicated structure types with function-like macro initializers seem to work. – Nominal Animal Feb 07 '16 at 22:05