0

In this part of my code I remove white spaces of string1 and copy the result to string2.

char * remove_blank_spaces(char * string1) {
   char * string2 = malloc(sizeof(string1));
   int index = 0;

   for (int i = 0; string1[i] != 0; i++) {
      if(string1[i] != ' ') {
         //printf("i: %d\n", i);
         //printf("c2: %c\n", string1[i]);
         string2[index] = string1[i];
         index++;
      }
   }
   string2[index] = '\0';

   printf("string2: %s\n", string2);
   return string2;
}

I check the result with:

assert(remove_blank_spaces("a  b") == "ab");  // Edit: here is the error!

I got an error: Assertion failed! and Expression: remove_blank_spaces("a b") == "ab"

I compared the strings in Virtual-C and they look the same. Why the assertion is failing?

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
kame
  • 20,848
  • 33
  • 104
  • 159
  • 3
    use `strcmp` to compare strings, not `==`. – Blaze Sep 10 '19 at 09:48
  • 3
    `sizeof` of a pointer is the size of the pointer itself, not what it points to. To get the length of a string (not including the very important null-terminator) use [`strlen`](https://en.cppreference.com/w/c/string/byte/strlen). – Some programmer dude Sep 10 '19 at 09:48
  • 1
    As for your assertion, it's because you compare two pointers that will never be equal. As mentioned you solve that by using [`strcmp`](https://en.cppreference.com/w/c/string/byte/strcmp). Perhaps you should consider refreshing your basic C knowledge? – Some programmer dude Sep 10 '19 at 09:51
  • @Someprogrammerdude could I also use: int number_of_elements = sizeof(string1) / sizeof(char); char * string2 = malloc (number_of_elements * sizeof (char)); – kame Sep 10 '19 at 09:54
  • 1
    No, because `string1` is still a pointer, and `sizeof string1` will give you the size of the pointer itself. You need to use `strlen(string1) + 1` (the `+1` for the terminator). Also note that `sizeof(char)` is defined by the C specification to always be equal to `1`. – Some programmer dude Sep 10 '19 at 09:56
  • 1
    Lastly, your code as shown have a memory leak. Since you do not store the pointer returned by `remove_blank_spaces` you have no possibility to `free` the memory the function allocated. – Some programmer dude Sep 10 '19 at 10:01
  • @Someprogrammerdude could I free the memory behind `return string2;` ? – kame Sep 10 '19 at 10:33
  • 1
    @kame No, `return` returns immediately from the function. And if you call `free` before then the returned pointer would be invalid and could not be used for e.g. string comparisons. – Some programmer dude Sep 10 '19 at 10:38
  • 1
    There's so many classic beginner mistakes here that my recommendation is to step away from this program and find a decent C book. – Lundin Sep 10 '19 at 11:12
  • @Lundin I thought it would be good to practice instead of just reading the theory. Sorry I am still a beginner. – kame Sep 10 '19 at 11:17
  • @kame Writing lots of code is totally good practice and highly encouraged when you are learning. However, all of these problems have been asked about and answered many times over, so it isn't the best question for SO. As usual it is hard to find the dupes though. There's a little [C FAQ](https://stackoverflow.com/tags/c/info) below the tag wiki that may or may not be helpful. – Lundin Sep 10 '19 at 11:21
  • Yes as you said, it is very difficult to recognise that assert could be the problem. '==' instead of strcpy is used. Thank you. I will read C FAQ now. – kame Sep 10 '19 at 11:23

2 Answers2

1

Your code has a bug: malloc allocates insufficient space, and this results in undefined behaviour when trying to access unallocated memory.

The assertion is also failing because you are comparing pointers via ==, instead of C strings via strcmp.

Furthermore, I suggest making two changes:

  • Don’t mix computation and output. Return the value, don’t printf it inside the function.
  • Use descriptive and correct names. This requires taking context into account. For instance, index can generally be a good name, but in your case it’s unclear which index you’re referring to, and this invites errors, where index is used to index into the wrong variable. As for “correct” names, what you call “blank space” is more conventionally known as “whitespace”.

To improve the second point, I suggest actually changing the implementation and, instead of having a second index variable, to iterate over the output using a pointer. There are other possibilities, but this one has the advantage that accidentally indexing using the wrong variable is impossible.

Taking this together, we get

char *remove_whitespace(const char *str) {
    char *result = malloc(strlen(str) + 1);
    char *out = result;

    for (size_t i = 0; str[i] != '\0'; i++) {
       if (str[i] != ' ') {
           *out++ = str[i];
       }
    }
    *out = '\0';
    return result;
}

We could additionally do away with the i loop counter. Unfortunately the result is less readable, not more, because we would need to increment str at the end of the loop, and this would leave us with an unsightly for (; *str != '\0'; str++) loop construct.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    As noted by others, this code also potentially allocates *way* too much memory. However, this might be acceptable, depending on the exact use-case. – Konrad Rudolph Sep 10 '19 at 10:30
  • May I ask you why you call the variable out out? – kame Sep 10 '19 at 11:04
  • 1
    @kame Because it’s a pointer used for *output*, i.e. to write values to (and where the code that uses it doesn’t care terribly where the value goes, just that it goes “somewhere”). In C++ this is formally known as an “output iterator” but the concept also exists in C. – Konrad Rudolph Sep 10 '19 at 13:51
1

For starters this function declaration

char * remove_blank_spaces(char * string1) {

is incorrect and only confuses users of the function. If within the function you are creating a new character array then the parameter shall have the qualifier const.

char * remove_blank_spaces( const char * string1) {

Otherwise the function should change the original string "in-place".

This call

char * string2 = malloc(sizeof(string1));

also is incorrect. I think you mean

char * string2 = malloc( strlen( string1 ) + 1 );

But even this call is not very good because the result string can be much less than the original string.

So at first you should count the numb er of characters in the result string and only then allocate the memory.

This assert is also incorrect

assert(remove_blank_spaces("a  b") == "ab");

In this expression there are compared addresses of two string: the first one is the string returned by the function and the second one is the string literal.

Even if you will write an expression like this

assert( "ab" == "ab");

the value of the expression can be equal either to logical true or false depending on the compiler option that specifies whether equal string literals are stored as one string literal or occupy different extents of memory.

You should write instead

assert( strcmp( remove_blank_spaces("a  b"), "ab" ) == 0 );

Take into account that it is reasonable also to consider trhe tab character '\t' in the if statement like

if(string1[i] != ' ' && string1[i] != '\t') {

Or you could use the standard function isblank.

Here is a demonstrative program

#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#include <assert.h>
#include <string.h>

char * remove_blank_spaces( const char *s ) 
{
    size_t n = 0;

    for ( size_t i = 0; s[i] != '\0'; i++ )
    {
        if ( !isblank( ( unsigned char )s[i] ) ) ++n;
    }

    char *result = malloc( n + sizeof( ( char )'\0' ) );

    char *p = result;

    do
    {
        if ( !isblank( ( unsigned char )*s ) )
        {
            *p++ = *s;
        }
    } while ( *s++ != '\0' );

    return result;
}

int main(void) 
{
    const char *s1 = "a b";
    char *s2 = remove_blank_spaces( s1 );

    assert( strcmp( s1, s2 ) == 0 );

    puts( s2 );

    free( s2 );

    return 0;
}

The program output is

ab

Pay attention to that instead of the type int as it is shown in other answers you should use the type size_t for the variables index and i because it is the type that is used with string lengths and indices and by the function malloc. The type int is not large enough to store size of strings.

If you indeed want to declare the function like

char * remove_blank_spaces( char *s ) 

that is when the parameter does not have the qualifier const then you shall not allocate dynamically a new character array within the function and the function itself can look much simpler.

Here is a demonstrative program.

#include <stdio.h>
#include <assert.h>
#include <string.h>

char * remove_blank_spaces( char *s ) 
{
    char *destination = s;
    char *source = s;

    do
    {
        if ( *source != ' ' && *source != '\t' )
        {
            *destination++ = *source;
        }
    } while ( *source++ != '\0' );

    return s;
}

int main(void) 
{
    char s[] = "a b";

    remove_blank_spaces( s );

    assert( strcmp( s, "ab" ) == 0 );

    puts( s );

    return 0;
}

Its output is

ab
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • `sizeof( ( char )'\0' )` must be one of the most convoluted ways of spelling `1` that I’ve ever seen. Does it really increase readability? I doubt it. I’m also not sure what the cast to `unsigned char` in the `isblank` call is supposed to accomplish. It makes no sense. – Konrad Rudolph Sep 10 '19 at 10:37
  • 1
    @KonradRudolph Yes it indeed increases readability. Instead of using the magic number 1 it is better to specify for which character a byte is reserved. – Vlad from Moscow Sep 10 '19 at 10:39
  • You need to balance clutter against explicitness. And `+ 1` inside a `malloc` for a `char*` is universally understood to refer to the null termination (indeed, it couldn’t possibly mean anything else), there is no need to belabour the obvious. It’s akin to adding a comment after `i++` to say “this increments i”. In that sense, adding an offset of `1` in code is not generally seen as a “magic number”. – Konrad Rudolph Sep 10 '19 at 10:40
  • @KonradRudolph You can do as you like. Please, do not point me how to write a qualitative code. It will be better if you in your own code substitute the type int for the type size_t as it is done in my answer. Goof luck.) – Vlad from Moscow Sep 10 '19 at 10:45
  • “Please, do not point me how to write a qualitative code” — Uh, that’s the *whole purpose* of Stack Overflow. Incidentally, the suggestion to use `size_t` is good. – Konrad Rudolph Sep 10 '19 at 10:47