Removing White Spaces

Question

I use the following to remove spaces from my variable

        for (i=0, ptr=lpsz;ptr[i];ptr++)
        {
            if (*ptr == ' ')
                i++;

            *ptr = ptr[i];
        }
        *ptr=0;

It seems to be having problems when there is more than one space however and I am not sure what I am doing wrong. Can anyone help me please?

It replaces a space with the next character in the string and then moves onto the character *after* the replacement. The replacement character is never checked. — xen-0, Aug 21 '13 at 13:56
@xen-0 thank you. My string is like `hello world today`. and after it reads `helloworldoday` missing the first letter of the third word. As a newbie im confused..lol — Natalie Carr, Aug 21 '13 at 13:59
@xen-0 I realize my error of wording in my question. Thank you — Natalie Carr, Aug 21 '13 at 14:00

score 6 · Answer 1 · edited May 23 '17 at 12:16

6

I suggest you use std::isspace instead of doing your pointer magic.

If you use a std::string in combination with std::isspace you can do something like this:

std::string str = "Hello World Today";

str.erase(remove_if(str.begin(), str.end(), isspace), str.end());

source

A string really is just a container of characters, and therefore you can apply the erase/remove idiom to it.

edited May 23 '17 at 12:16

Community

1
1

answered Aug 21 '13 at 13:56

Tony The Lion

61,704
67
242
415

@0x499602D2: Really, it is `remove_if` that is replacing the pointer magic. – Ben Voigt Aug 21 '13 at 14:02
5

Note that this code is **not** safe and can result in undefined behavior, e.g., when using it with my name! The problem is that the `std::isspace()` and family accepts only positive arguments but `char` may be signed. You need to use something like similar to `[](unsigned char c){ return std::isspace(c); }`. Of course, to replace the original code, `std::remove(str.begin(), str.end(), ' ')` would work. – Dietmar Kühl Aug 21 '13 at 14:05
@DietmarKühl Thanks for that note. Interesting. – Tony The Lion Aug 21 '13 at 14:06
Thank you will try to implement :) – Natalie Carr Aug 21 '13 at 14:06
@DietmarKühl You do realize since you mentioned that very fact to me in comment to one of my answers a *year* ago I have *never* seen you're name on a question or answer and not thought back about it. Talk about getting into a guys head =P – WhozCraig Aug 21 '13 at 14:09
@WhozCraig: You are basically saying I'm now the ``-guy in your mind? Although it could be worth, I do think that I have a few more areas of expertise than just that but possibly this particular area is the overlap over our interests. – Dietmar Kühl Aug 21 '13 at 14:15
@DietmarKühl No doubt. You've expertise-o-plenty. I'm saying after you pointed this out in one of my answers a long time ago just *seeing your name on a question/answer* reminds me to never make that mistake again. Don't think of pink elephants. Too late. =P – WhozCraig Aug 21 '13 at 14:20
I thought of pink elephants and they were incredibly pink. :) – Tony The Lion Aug 21 '13 at 14:20
@TonyTheLion *Exactly*. – WhozCraig Aug 21 '13 at 14:21
the only problem with isspace is that it considers only whitespace characters in the C locale, hence it's not Unicode safe. – Trasplazio Garzuglio Apr 21 '16 at 16:46

Ben Voigt · Accepted Answer · 2013-08-21T17:57:57.637

This should work. It's much easier to think about two pointers moving through the string than one pointer plus an offset.

auto sptr = lpsz;
auto dptr = lpsz;
while (*dptr = *sptr) { // copy up to and including terminating NUL
    if (*dptr != ' ') dptr++; // but spaces take no slots in the destination
    sptr++;
}

Demo: http://ideone.com/Pmmc20

or even

auto sptr = lpsz;
auto dptr = lpsz;
while (auto c = *dptr = *sptr) {
    dptr += (c != ' ');
    sptr++;
}

The essential difference between this and the original code is that when we see something other than a space, both mine and the original move the read and write location forward one. But when we see a space, I move the read position forward by 1 and don't move the write position, while the original moves the read position forward by 2 and the write position forward by one, which skips a character. Also, the original is testing at the write location for a space, instead of at the read location (mine tests the write location, but it does so after copying the character from the read location, so the result is correct).

David Rodríguez - dribeas · Answer 3 · 2013-08-21T19:38:58.477

You should use algorithms for this, as they are well tested. But if you want to analyze your code and understand the failure consider a high level description of what your code is running (See Tony's answer).

You maintain two indices into the buffer, one for reading and one for writing. Whenever the reading head detects a space you move it but skip the write. If the character is not a space you use the read head to get a value and write it through the write head.

In your implementation the read head is ptr+i, which is spelled a bit strange as an offset from the write head (as in ptr[i]). The write head is ptr (*ptr =). But your test in the loop is using the write head instead of the read head: if (*ptr==' ').

Even if you fix this, the implementation has other issues, like for example the fact that if there are two consecutive spaces, since you do a single test in the loop. A rewrite of your algorithm could be:

char* remove_spaces(char* buffer) {
   char *read = buffer;
   for (char *read = buffer; (*read), ++read) {
      if (*read != ' ') {      // copy element
         *buffer = *read;
         ++buffer;             // and increment write head
      }
   }
   *buffer = 0;                // ensure null termination
   return read;
}

Now, the algorithm can be further improved (performance) by removing the number of writes to memory if you do an initial search for the first space and use that as the starting point for the loop above. That will reduce the number of operations and the number of cache lines that are marked dirty.

xen-0 · Answer 4 · 2013-08-21T14:18:35.777

0

Step through your loop carefully.

i is set to 0 at the start of the loop. For the first space encountered, i is incremented (so i==1). The character at ptr is replaced with the character at prt+i, which is the next character. This works the first time, because i is 1.

But for the second space, i is set to 2 (since it's incremented), so the space is replaced by the character at ptr+2

It would be much easier to do this as part of a string copy, rather than an in-place alteration. dest is a destination buffer for our altered copy.

for(ptr = lpsz; *ptr; ptr++){
    if(' ' == *ptr) {continue;}
    *dest = *ptr;
    dest++;
}
*dest = 0;

edited Aug 21 '13 at 14:18

answered Aug 21 '13 at 14:13

xen-0

709
4
12

That it will. Which is amusing, because I picked up on that in the original code too. Edited code. – xen-0 Aug 21 '13 at 14:18
@DavidRodríguez-dribeas: I'd rather say it doesn't do anything special with spaces but it does whatever it does with zeros ;) (when you wrote your comment and I wrote this comment originally) – Dietmar Kühl Aug 21 '13 at 14:18

Removing White Spaces

4 Answers4