22

In C++, what's an easy way to turn:

This std::string

\t\tHELLO WORLD\r\nHELLO\t\nWORLD     \t

Into:

HELLOWORLDHELLOWORLD
TemplateRex
  • 69,038
  • 19
  • 164
  • 304
Mr. Smith
  • 4,288
  • 7
  • 40
  • 82
  • 1
    @tomislav-maric I don't think it's a duplicate of that post, the OP there was working with a `cin` stream, and thus using iostream functions. – Mr. Smith Jan 09 '13 at 10:29
  • similar but not exact duplicate, so not voting to close. – CashCow Jan 09 '13 at 10:29
  • @CashCow I checked it again.. you are right, sorry about that. – tmaric Jan 09 '13 at 10:36
  • 2
    See also [Remove spaces from std::string in C++](http://stackoverflow.com/questions/83439/remove-spaces-from-stdstring-in-c) – user Feb 26 '14 at 02:17

6 Answers6

35

Simple combination of std::remove_if and std::string::erase.

Not totally safe version

s.erase( std::remove_if( s.begin(), s.end(), ::isspace ), s.end() );

For safer version replace ::isspace with

std::bind( std::isspace<char>, _1, std::locale::classic() )

(Include all relevant headers)

For a version that works with alternative character types replace <char> with <ElementType> or whatever your templated character type is. You can of course also replace the locale with a different one. If you do that, beware to avoid the inefficiency of recreating the locale facet too many times.

In C++11 you can make the safer version into a lambda with:

[]( char ch ) { return std::isspace<char>( ch, std::locale::classic() ); }
CashCow
  • 30,981
  • 5
  • 61
  • 92
  • @chris `::isspace` includes the new line as well: http://www.cplusplus.com/reference/cctype/isspace/ – Ivaylo Strandjev Jan 09 '13 at 10:28
  • it will. isspace will return true for newlines. – CashCow Jan 09 '13 at 10:28
  • 4
    `isspace` has UB for all characters except those in the basic something something. C99 §7.4/1. – R. Martinho Fernandes Jan 09 '13 at 10:30
  • How did you perform your output? Are you sure you didn't stick one in e.g .(std::cout << s << std::endl) – CashCow Jan 09 '13 at 10:31
  • Never mind, it was me being completely stupid and not passing the second argument to `erase` (I typed one up before the answer). – chris Jan 09 '13 at 10:31
  • @R.MartinhoFernandes does C99 standard apply to C++? C++ has its own standard. – CashCow Jan 09 '13 at 10:32
  • 2
    C++98 delegates the behaviour of the C standard library to C89, and C++11 delegates the behaviour of the C standard library to C99. – R. Martinho Fernandes Jan 09 '13 at 10:33
  • @chris Yes as std::remove_if returns an iterator, and erase has an overload for a single iterator, it will indeed compile and not give you the result you want if you forget the second s.end() – CashCow Jan 09 '13 at 10:34
  • @CashCow, I know, it's completely irritating when you forget it. In my case, I never saw the second argument when reading it how many times before I finally used it, so it's still wired in my brain that it only takes one. – chris Jan 09 '13 at 10:34
  • Presumably the -1 from Mr Fernandes for use of ::isspace. perhaps he will enlighten us as to the special locale-based / character-set-based? You know for perfect UTF-8 it is not necessarily even a character-char one-to-one relationship so no functor / lambda will work here officially. The only thing that will work for perfect UTF-8 iteration that might be multi-character is a custom iterator. – CashCow Jan 09 '13 at 10:51
  • FWIW, all the whitespace characters in the example are encoded as single byte sequences in UTF-8, so yes, a simple lambda works for UTF-8. – R. Martinho Fernandes Jan 09 '13 at 11:00
  • You are saying that what looks like a whitespace will never appear as part of a multibyte character? I don't know the UTF-8 standard. The only thing I see as "undefined" are things like   (non-breaking space) which is commonly ASCII 160 (or 0xA0) but might vary in other character sets. – CashCow Jan 09 '13 at 11:04
  • 1
    My apologies. I got slightly confused about the true nature of the problem :) I knew using isspace was wrong, but I got confused as to the why. The why is related to `isspace` taking an `int` and to `char` being signed. Here is a small program that explains the issue http://stacked-crooked.com/view?id=817f92f4a2482e5da0b7533285e53edb. – R. Martinho Fernandes Jan 09 '13 at 11:34
  • (And as a side note, NBSP is not in ASCII. ASCII has only 128 values). – R. Martinho Fernandes Jan 09 '13 at 11:35
  • 1
    (And note how this is not about multibyte encodings; any byte with a value higher than 0x7F in the source, *regardless of encoding* will trigger this issue; even single byte encodings like Latin-1 or Windows-1252 will cause it. Only 7-bit encodings like ASCII work fine) – R. Martinho Fernandes Jan 09 '13 at 11:53
  • Ok I have given the alternative answer that uses std::isspace with a locale. – CashCow Jan 09 '13 at 12:27
  • Doesn't the lambda version require a "return" statement? – PatchyFog Aug 12 '15 at 21:10
  • For C++ newbies like me _1 is from std::placeholders, and represent future arguments – bmatovu Mar 29 '17 at 14:44
13

If C++03

struct RemoveDelimiter
{
  bool operator()(char c)
  {
    return (c =='\r' || c =='\t' || c == ' ' || c == '\n');
  }
};

std::string s("\t\tHELLO WORLD\r\nHELLO\t\nWORLD     \t");
s.erase( std::remove_if( s.begin(), s.end(), RemoveDelimiter()), s.end());

Or use C++11 lambda

s.erase(std::remove_if( s.begin(), s.end(), 
     [](char c){ return (c =='\r' || c =='\t' || c == ' ' || c == '\n');}), s.end() );

PS. Erase-remove idiom is used

billz
  • 44,644
  • 9
  • 83
  • 100
4

c++11

std::string input = "\t\tHELLO WORLD\r\nHELLO\t\nWORLD     \t";

auto rs = std::regex_replace(input,std::regex("\\s+"), "");

std::cout << rs << std::endl;

/tmp ❮❮❮ ./play

HELLOWORLDHELLOWORLD
jassinm
  • 7,323
  • 3
  • 33
  • 42
4

In C++11 you can use a lambda rather than using std::bind:

str.erase(
    std::remove_if(str.begin(), str.end(), 
        [](char c) -> bool
        { 
            return std::isspace<char>(c, std::locale::classic()); 
        }), 
    str.end());
pje
  • 2,458
  • 1
  • 25
  • 26
3

You could use Boost.Algorithm's erase_all

#include <boost/algorithm/string/erase.hpp>
#include <iostream>
#include <string>

int main()
{
    std::string s = "Hello World!";
    // or the more expensive one-liner in case your string is const
    // std::cout << boost::algorithm::erase_all_copy(s, " ") << "\n";
    boost::algorithm::erase_all(s, " "); 
    std::cout << s << "\n";
}

NOTE: as is mentioned in the comments: trim_copy (or its cousins trim_copy_left and trim_copy_right) only remove whitespace from the beginning and end of a string.

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • I saw some solutions that used Boost, but I'm not after a `trim` function, trimming I believe is doing something like `XX___XX_` -> `XX_XX` whereas I want the final solution to be `XXXX`. – Mr. Smith Jan 09 '13 at 10:34
2

Stepping through it character by character and using string::erase() should work fine.

void removeWhitespace(std::string& str) {
    for (size_t i = 0; i < str.length(); i++) {
        if (str[i] == ' ' || str[i] == '\n' || str[i] == '\t') {
            str.erase(i, 1);
            i--;
        }
    }
}
SelectricSimian
  • 129
  • 1
  • 3