3

I was reading a source code file. But I got stuck at the following line

while (isspace (* bp & 0xff))
    ++ bp;

I know the basic idea is to remove the spaces. But I don't know what 0xff is exactly doing here in the following function.

static enum tokens scan (const char * buf)
{
    static const char * bp;

if (buf)
    bp = buf;       /* new input line */

while (isspace (* bp & 0xff))
    ++ bp;

if (isdigit (* bp & 0xff) || * bp == '.')
{
    errno = 0;
    token = NUMBER, number = strtod (bp, (char **) & bp);
    if (errno == ERANGE)
        error ("bad value: %s", strerror (errno));
}
else
token = * bp ? * bp ++ : 0;

    return token;
}
edmz
  • 8,220
  • 2
  • 26
  • 45
Begginer
  • 93
  • 1
  • 1
  • 5
  • What's the type of `bp`? – edmz Sep 19 '15 at 11:27
  • 3
    Please show us the surrounding code, in particular the definition and contents of `bp`. – orlp Sep 19 '15 at 11:27
  • The Author has years of programming experience. And at first he wrote that like isspace (* bp) but then he changed it. So, I'm looking here for a meaningful reason. – Begginer Sep 19 '15 at 11:53

3 Answers3

4

The isspace function and the other ctype.h function expect an int as argument. From the C11 standard section 7.4/1:

The header declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

This means that if you have code such as:

char ch = 'é';   // same as: char ch = -126; for some code pages

isspace(ch);

then this call causes undefined behaviour.

The rationale for this is so that the function can be implemented as a lookup table: #define isspace(x) space_table[x]

Causing undefined behaviour is bad of course, so isspace(ch) is wrong. The correct way to fix the code is:

isspace( (unsigned char)ch );

On a machine that uses 2's complement arithmetic, then ch & 0xFF is exactly equivalent to (unsigned char)ch.

On a machine that doesn't use 2's complement then it will look up the wrong value (but not cause undefined behaviour).

Perhaps your programmer was happy to assume that his code would never run on a non-2's complement machine with negative character codes for whitespace, and he felt that & 0xFF was more aesthetic than a cast.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
  • I do not want to believe that any compiler vendor would ever write isspace (and any other such function) in a way that it cannot be used for `char c` whatever is in this `c`... Can you show me an example of such compiler? – PiotrNycz Sep 19 '15 at 14:07
  • @PiotrNycz I can only show the C standard which governs all compilers – M.M Sep 19 '15 at 14:26
  • Obviously there was such a compiler in the past, otherwise the ANSI C authors wouldn't have felt obliged to write the standard this way to support such a compiler – M.M Sep 19 '15 at 14:29
  • `(ch & 0xFF)` does nothing for signed chars. `-1 & 0xFF` is still -1. – John Hammond Sep 19 '15 at 14:48
  • @M.M You are cheating ;) %d is the wrong parameter. Use %hhd and see it working. – John Hammond Sep 19 '15 at 15:12
  • @LarsFriedrich `ch & 0xFF` has type `int`, so `%d` is the right specifier. You seem to be forgetting about the integer promotions. Try `sizeof(ch & 0xFF)` if unconvinced. – M.M Sep 19 '15 at 15:20
  • M.M. If you would use `printf("%hhd", c);` with c being a char, there would be no point in also adding `& 0xFF`. And..if char is actually unsigned on the platform, it really serves no purpose ever. I mean, we agree on the problem. For some reason I get downvotes and you get upvotes. Sometimes I don't get SO. – John Hammond Sep 19 '15 at 15:46
  • @PiotrNycz In MS VC 2005, AFAIR, isspace( 0xFFFF ) will crash with access violation. Seen with other compilers too. – c-smile Sep 19 '15 at 15:54
  • @c-smile Your example does not show that `isspace` will not work with `char c = 0xffff; isspace(c);` – PiotrNycz Sep 19 '15 at 17:42
  • @PiotrNycz if `char` is `signed char` in your compiler then even that expression will fail. – c-smile Sep 19 '15 at 18:01
  • @c-smile An compiler (gcc on x86) where char is signed (CHAR_MIN is negative) - and it works well: ideone.com/qCr8Id – PiotrNycz Sep 19 '15 at 19:03
  • @LarsFriedrich Maybe something you consider correct actually isn't. Which is great, after all. – edmz Sep 20 '15 at 10:25
  • @LarsFriedrich `printf("%hhd", c);` with `c` being a char, has nothing to do with this question. `ch & 0xFF` is an `int`. The reason I'm getting upvotes and you downvotes is that I'm right and you're wrong – M.M Sep 23 '15 at 06:13
3

This operation is forcing zeros left of value.
[OR]
The operation *ch & 0xff select first 8 bits and isspace verify if value is space char.

lsalamon
  • 7,998
  • 6
  • 50
  • 63
2

Computing an AND operation with 0xFF extracts the lowest byte, assuming 8 bits per byte. There's no effect for non-negative values, but char can also be signed and in that case the resulting int can't be represented in an unsigned char; taking the lowest byte solves this problem.

Technically, in the expression ch & 0xFF, the operands are promoted to int, which might have scared the programmer because the parameter of isspace is an int, but the value shall fit into an unsigned char or have the value EOF, which can only be represented with ints.

edmz
  • 8,220
  • 2
  • 26
  • 45