0

I wrote a small program as an example to replicate the problem I'm having. The program takes and manipulates two lines in this way:

  1. It seperates the line by new lines.
  2. Each line consists of some number of letters followed by a blank (in this case ' '), followed by a number.
  3. It copies the letters to another string, then adds '\0' to it.
  4. It copies the number to another char.
  5. It prints the string in which are the copied letters and int in which is the converted copied number.

Here is that minimal program:

#include <stdio.h>
#include <string.h>

void read(char *text)
{
    // create and initialize the line holder
    int k = 0;
    char line[10];
    line[0] = '\0';
    for (int i = 0;text[i] != '\0';i++)
    {
        if (text[i] == '\n')
        {
            // k now points to the character right after the last assigned one, so put 0 in that place
            line[k] = '\0';

            // initialize data objects that will hold text and number
            char letters[5];
            letters[0] = '\0';
            char val;

            // step through the line, and stop if you 1. reached a blank or 2. reached the end of a line
            int j = 0;
            while (line[j] != ' ' && line[j] != '\t' && j <= (strlen(line) - 1))
            {
                printf("%d <= %ld = %d\n", j, strlen(line) - 1, j <= (strlen(line) - 1));
                if (j == (strlen(line) - 1)) // reached the last character before reaching blank
                    return;
                letters[j] = line[j];
                j++;
            }

            letters[j] = '\0'; // where should be blank place 0

            if (j + 1 == (strlen(line) - 1)) // if the next character is the last character, meaning that the character before the last one is blank
                val = line[j + 1];
            else // there is space in that is not one before the last character
                return; // this is where read("\n") should stop, but withou entering the while loop!

            printf("Word: %s\tVal: %d\n", letters, val - '0');

            // empty the line holder
            line[0] = '\0';
            k = 0;
        }
        else
        {
            // place the ith text character into the kth line character and print them
            line[k] = text[i];
            printf("line[k] = %c\ttext[i] = %c\n", line[k], text[i]);

            // increment k for the next turn
            k++;
        }
    }
}

int main()
{
    char *text = "ABCD 0\nEFGH 1\nIJKL 2\nMNOP 3\nQRST 4\nUVWX 5\nYZ 5\n";
    read(text);
    printf("---------------------------------\n");
    read("\n");
    return 0;
}

There are also the points when this program should terminate without doing its job if it detects an error. Those points are indicated by return keyword and comments in the read(char *text) function. There are only two of them, so I'll describe them here as well:

Line 28: Program will stop with scanning on this line if it detected that the current character is the last character. Since the last character should always be preceded with a blank, it means that we reached the end of a line without exiting the while loop (which would have happened if we reached ' ' or '\t').

Line 38: If we successfully exited the while loop, the character j is offset from line should be a blank. That is because we exited the while loop when we found a blank (that is also because we end the line with line[j] = '\0'). That also means that j+1 should be a number, which is the last character in the line. If that is not the case, we reached the blank that is not preceding a number so we exit the function.


So, where's the problem? I passed two strings to read(char *text) function, as you can see. read(char *text) manipulates and prints the first string perfectly. With the second one, which is only "\n", the function doesn't work that well. The part that I don't understand is that we enter the while loop, despite the condition j <= strlen(line) - 1), which somehow returns 1 when text = "\n". You can see that by running the program, it prints that information on line 26.
Hanlon
  • 432
  • 4
  • 13
  • 12
    Note that [`strlen`](https://en.cppreference.com/w/c/string/byte/strlen) returns a `size_t` value, which is ***unsigned***. Now think about what happens when `strlen` returns `0` and you subtract `1` from that... – Some programmer dude Sep 13 '18 at 08:09
  • 1
    You cannot name your function `read`, because there is already a standard function of that name, and this reserves the name for the compiler even if you avoid including the header that defines the standard function. – Pascal Cuoq Sep 13 '18 at 08:10
  • 2
    why not use `j < strlen(line)` instead ? – Sander De Dycker Sep 13 '18 at 08:10
  • 2
    Read [how to debug small programs](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/). Enable all warnings and debug info, e.g. `gcc -Wall -Wextra -g` when using [GCC](http://gcc.gnu.org/) – Basile Starynkevitch Sep 13 '18 at 08:12
  • 2
    @PascalCuoq: are you really sure? IIRC `read` is a POSIX function, not a C11 standard one. But in practice you are fully right. Naming a function `read` is not reasonable – Basile Starynkevitch Sep 13 '18 at 08:13
  • 1
    @BasileStarynkevitch Good point! `read` is indeed a POSIX function, which means that http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html applies instead of the C standard, and it does not seem to claim the name as long as the header is not included. – Pascal Cuoq Sep 13 '18 at 08:51
  • 1
    @SanderDeDycker, I opose to using `j < strlen(line)` as a loop condition because it potentially calls `strlen` _every_ iteration. I also opose to "but the compiler knows this" or whatever, and especially new programmers should know the potential cost of writing such conditions. – Paul Ogilvie Sep 13 '18 at 09:15
  • 1
    @PaulOgilvie : I wasn't trying to make the OP's code more efficient (a lot more changes would be needed for that). I was just pointing out to the OP that there's no need to subtract `1` and then use `<=`, since simply using `<` achieves the same, and avoids the reported issue. – Sander De Dycker Sep 13 '18 at 09:34
  • @SanderDeDycker, ah...didn't see that. – Paul Ogilvie Sep 13 '18 at 09:41
  • [Why sizeof(int) is not greater than -1?](https://stackoverflow.com/q/24466857/995714), [void main() { if(sizeof(int) > -1) printf("true"); else printf("false"); ;](https://stackoverflow.com/q/20853451/995714) – phuclv Sep 14 '18 at 07:40

2 Answers2

7

In strlen(line) - 1, the type of strlen(line) is size_t, an unsigned integer type. With the definition of size_t on your compilation platform, C's promotion rules make the subtraction a size_t (unsigned) subtraction, giving a size_t(unsigned) result. The result is (size_t)-1, which is typically 0xffffffff or 0xffffffffffffffff.

While not providing the above explanation, this online C interpreter hints at the problem by indicating that you are passing the wrong type for the format %ld in printf. On the selected compilation platform, the above applies and the printf argument strlen(line) - 1 has type size_t, which should be printed with %zu.

This unsigned arithmetic causes your program to use line[j] while this memory location is uninitialized. If you change all occurrences of strlen(line) - 1 to (int)strlen(line) - 1, to force an int subtraction computing a signed result, then the program does not have undefined behavior.

As noted in the comments, changing strlen(line) to (int)strlen(line) is only a quick-and-dirty fix, and limits the scope of inputs the program can be applied to if int is narrower than size_t. The proper fix is to examine each larger expression involving the result of strlen and to rewrite it so that it does what the programmer intends using size_t arithmetic. As an example, the condition j == (strlen(line) - 1) can be written (size_t)j + 1 == strlen(line). This in turn suggests that a number of variables, including j, should be directly declared as size_t instead of int.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • How do you mean that `line[j]` uninitialized? Please explain further. – Hanlon Sep 13 '18 at 08:26
  • Oh thanks. I casted all `strlen(line)` to `(signed int)strlen(line)`. Everything works fine now. – Hanlon Sep 13 '18 at 08:32
  • @Hanlon I did not realize it immediately, but `line[j]` being used initialized was another strange consequence of the subtraction `strlen(line)-1` not computing the result you expected. I have since updated my answer to suggest the same fix that you have found. – Pascal Cuoq Sep 13 '18 at 08:34
  • 2
    I can't really recommend casting to `int` - you'll get no warning if the value is truncated (and remember that `INT_MAX` could be just 32767, depending on where the code is compiled). It's better to make the code overflow-free (perhaps by adding 1 to the other side of a comparison, changing `<=` to `<`, or by testing the value before the arithmetic). – Toby Speight Sep 13 '18 at 11:27
  • 1
    @TobySpeight I agree with the principle of your comment, but as I was incorporating it in my answer, I arrived to the conclusion that if you do not at the same time change to `size_t` the type of a number of variables that are currently declared as `int`, you are not making the program significantly more robust (in particular if `INT_MAX` is only 32767). – Pascal Cuoq Sep 13 '18 at 16:15
  • Yes - good edit. :-) – Toby Speight Sep 13 '18 at 16:19
4

Solution to these kind of problems (and many others) is to turn on compiler warnings.

$ clang -Wall -Wextra -std=c11 -pedantic-errors k.c
k.c:24:59: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
            while (line[j] != ' ' && line[j] != '\t' && j <= (strlen(line) - 1))
                                                        ~ ^   ~~~~~~~~~~~~~~~~
k.c:26:67: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
                printf("%d <= %ld = %d\n", j, strlen(line) - 1, j <= (strlen(line) - 1));
                                                                ~ ^   ~~~~~~~~~~~~~~~~
k.c:27:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
                if (j == (strlen(line) - 1)) // reached the last character before reaching blank
                    ~ ^   ~~~~~~~~~~~~~~~~
k.c:35:23: warning: comparison of integers of different signs: 'int' and 'unsigned long' [-Wsign-compare]
            if (j + 1 == (strlen(line) - 1)) // if the next character is the last character, meaning that t...
                ~~~~~ ^   ~~~~~~~~~~~~~~~~
4 warnings generated.

Comparing signed and unsigned is indeed the problem here.

In order to perform the comparison, the operands are implicitly casted. j == (strlen(line) is equivalent to (size_t)j == (strlen(line) except that the latter will not generate a warning.

klutt
  • 30,332
  • 17
  • 55
  • 95