8

I have a program in which I wanted to remove the spaces from a string. I wanted to find an elegant way to do so, so I found the following (I've changed it a little so it could be better readable) code in a forum:

char* line_remove_spaces (char* line)
{
    char *non_spaced = line;
    int i;
    int j = 0;
    for (i = 0; i <= strlen(line); i++)
    {
        if ( line[i] != ' ' )
        {
            non_spaced[j] = line[i];
            j++;
        }
    }
    return non_spaced;
}

As you can see, the function takes a string and, using the same allocated memory space, selects only the non-spaced characters. It works!

Anyway, according to Wikipedia, a string in C is a "Null-terminated string". I always thought this way and everything was good. But the problem is: we put no "null-character" in the end of the non_spaced string. And somehow the compiler knows that it ends at the last character changed by the "non_spaced" string. How does it know?

unwind
  • 391,730
  • 64
  • 469
  • 606
vaulttech
  • 493
  • 1
  • 5
  • 15
  • What do you mean by "the compiler knows it"? You're changing it at runtime, compile process is long over. – Fred Apr 27 '12 at 11:34
  • 2
    For what it's worth, `strlen(line)` will recalculate the length of the string every time. This is a non-trivial calculation, and should not be done on every loop iteration. You would do much better to calculate it once and store the it: `size_t len = strlen(line); for (i = 0; i <= len; i++)`. (Also, all the variables you have as `int`s should technically be type `size_t`.) – Chris Lutz Apr 27 '12 at 11:38

7 Answers7

13

This does not happen by magic. You have in your code:

for (i = 0; i <= strlen(line); i++)
              ^^

The loop index i runs till strlen(line) and at this index there is a nul character in the character array and this gets copied as well. As a result your end result has nul character at the desired index.

If you had

for (i = 0; i < strlen(line); i++)
              ^^

then you had to put the nul character manually as:

for (i = 0; i < strlen(line); i++)
{
    if ( line[i] != ' ' )
    {
        non_spaced[j] = line[i];
        j++;
    }
}
// put nul character
line[j] = 0;
codaddict
  • 445,704
  • 82
  • 492
  • 529
  • the null character would be '\0' not ' ' – KBN Apr 27 '12 at 11:42
  • @xFortyFourx: http://stackoverflow.com/questions/4705968/what-is-value-of-eof-and-0-in-c – codaddict Apr 27 '12 at 11:45
  • Hello codaddict, I'm not sure why you posted that thread. The ASCII value of '\0' = 0 ASCII value of ' ' (empty/white space) = 32 – KBN Apr 27 '12 at 11:50
  • 1
    The last line `line[j] = 0;` should be `non_spaced[j] = 0;`. And @KBN, this code is to remove the spaces from the given string, read the OP's question and you'll know :) – starriet Jul 29 '22 at 02:31
  • @starriet haha, that's 10years old, I've no idea what's going on,. – KBN Jul 30 '22 at 03:11
8

Others have answered your question already, but here is a faster, and perhaps clearer version of the same code:

void line_remove_spaces (char* line)
{
  char* non_spaced = line;

  while(*line != '\0')
  {
    if(*line != ' ')
    {
      *non_spaced = *line;
      non_spaced++;
    }

    line++;
  }

  *non_spaced = '\0';
}
Lundin
  • 195,001
  • 40
  • 254
  • 396
2

The loop uses <= strlen so you will copy the null terminator as well (which is at i == strlen(line)).

Andreas Brinck
  • 51,293
  • 14
  • 84
  • 114
1

You could try it. Debug it while it is processing a string containing only one space: " ". Watch carefully what happens to the index i.

nik7
  • 806
  • 3
  • 12
  • 20
Martin James
  • 24,453
  • 3
  • 36
  • 60
0

How do you know that it "knows"? The most likely scenario is that you're simply having luck with your undefined behavior, and that there is a '\0'-character after the valid bytes of line end.

It's also highly likely that you're not seeing spaces at the end, which might be printed before hitting the stray "lucky '\0'".

A few other points:

  • There's no need to write this using indexing.
  • It's not very efficient to call strlen() on each loop iteration.
  • You might want to use isspace() to remove more whitespace characters.

Here's how I would write it, using isspace() and pointers:

char * remove_spaces(char *str)
{
  char *ret = str, *put = str;

  for(; *str != '\0'; str++)
  {
    if(!isspace((unsigned char) *str)
      *put++ = *str;
  }
  *put = '\0';

  return ret;
}

Note that this does terminate the space-less version of the string, so the returned pointer is guaranteed to point at a valid string.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • If `line` is a null terminated string, then `non_spaced` is guaranteed to be null terminated so it's no "lucky `\0`" – Anthales Apr 27 '12 at 11:39
0

The string parameter of your function is null-terminated, right? And in the loop, the null character of the original string get also copied into the non spaced returned string. So the non spaced string is actually also null-terminated!

For your compiler, the null character is just another binary data that doesn't get any special treatment, but it's used by string APIs as a handy character to easily detect end of strings.

OOEngineer
  • 447
  • 3
  • 12
0

If you use the <= strlen(line), the length of the strlen(line) include the '\0' so your program can work. You can use debug and run analysis.

Yun
  • 3,056
  • 6
  • 9
  • 28
Hong Wei
  • 416
  • 3
  • 8