5

I'm looking for an elegant way to transform an std::string from something like:

std::string text = "   a\t   very  \t   ugly   \t\t\t\t   string       ";

To:

std::string text = "a very ugly string";

I've already trimmed the external whitespace with boost::trim(text);

[edit] Thus, multiple whitespaces, and tabs, are reduced to just one space [/edit]

Removing the external whitespace is trivial. But is there an elegant way of removing the internal whitespace that doesn't involve manual iteration and comparison of previous and next characters? Perhaps something in boost I have missed?

nerozehl
  • 463
  • 1
  • 7
  • 19
  • Just a note, I've not really used `boost::split` and `boost::join`, but the obvious way to write this in Python is `' '.join(text.split())`, and something similar should be possible. It's not necessarily as efficient as something that copies the bytes straight to their final location, but it's concise and clear. – Steve Jessop Feb 19 '12 at 19:01
  • Yeah; split and join work great if you don't mind copying; if you are worried about efficiency (in this case), writing your own loop is probably best. – Marshall Clow Feb 19 '12 at 19:29
  • @Marshall: I'm working on the basis that the question says, "elegant", not "fast but ugly" ;-) – Steve Jessop Feb 20 '12 at 09:50

6 Answers6

8

You can use std::unique with std::remove along with ::isspace to compress multiple whitespace characters into single spaces:

std::remove(std::unique(std::begin(text), std::end(text), [](char c, char c2) {
    return ::isspace(c) && ::isspace(c2);
}), std::end(text));
Seth Carnegie
  • 73,875
  • 22
  • 181
  • 249
  • It will not solve his problem. `test` also contains `'\t'` which is not equal to `' '`. – Nawaz Feb 19 '12 at 16:25
  • Won't this also do things like "letting" -> "leting" and skip over ` \t` pairs? – Travis Gockel Feb 19 '12 at 16:26
  • Whoops fixed it again, previously it wouldn't combine, for instance, a space and a tab next to each other, but now it does. – Seth Carnegie Feb 19 '12 at 16:34
  • 2
    Doesn't this result in `"a\tvery ugly string"` for the sample input, which is wrong? You could add a pass of `transform` (or maybe a `boost::transform_iterator`?) to replace all whitespace with space characters, but sometimes it's OK to give up and write a loop ;-) – Steve Jessop Feb 19 '12 at 18:43
  • 1
    Why ``std::remove``? You need ``std::replace_if`` after ``std::unique`` to replace ``\t`` characters with ``' '`` and it still wouldn't remove the leading and trailing whitespaces. This answer doesn't do what the OP asked. – Fernando Silveira Nov 28 '18 at 12:58
7
std::istringstream iss(text);
text = "";
std::string s;
while(iss >> s){
     if ( text != "" ) text += " " + s;
     else text = s;
}
//use text, extra whitespaces are removed from it
Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • 1
    Ah, interesting way of doing it, +1, though I've no idea which is more efficient between yours and mine (or that it matters for small strings or "cold" areas of code) – Seth Carnegie Feb 19 '12 at 16:29
  • I think, in the else-block `text.append(" " + s);` would be little bit faster. – Nawaz Feb 19 '12 at 16:36
  • That wouldn't do the same thing would it? (Right now it overwrites what was there before with `operator=` but `append` would be like changing it to `+=`; I think it might be a typo in the original code) – Seth Carnegie Feb 19 '12 at 16:40
  • @SethCarnegie: But that is what we want. Sorry, it was supposed to be `+=`, rather than `+`. I don't know why people voted it when it was not entirely correct :P – Nawaz Feb 19 '12 at 16:42
  • 4
    Also a pedantic note, it'd probably be better to do `if (!text.empty())` than `if (text != "")` – Seth Carnegie Feb 19 '12 at 17:08
  • I think you can improve this by doing `iss >> text` right before the `while` loop. This will remove the need for the `if else` block inside it, and you can just have `text += ' ' + s;` – Dillydill123 Jan 06 '17 at 19:27
5
#include <boost/algorithm/string/trim_all.hpp>
string s;
boost::algorithm::trim_all(s);
caktux
  • 5,235
  • 2
  • 19
  • 10
4

Most of what I'd do is similar to what @Nawaz already posted -- read strings from an istringstream to get the data without whitespace, and then insert a single space between each of those strings. However, I'd use an infix_ostream_iterator from a previous answer to get (IMO) slightly cleaner/clearer code.

std::istringstream buffer(input);

std::copy(std::istream_iterator<std::string>(buffer),
          std::istream_iterator<std::string>(),
          infix_ostream_iterator<std::string>(result, " "));
Community
  • 1
  • 1
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
1

If you check out https://svn.boost.org/trac/boost/ticket/1808, you'll see a request for (almost) this exact functionality, and a suggested implementation:

std::string trim_all ( const std::string &str ) {
return boost::algorithm::find_format_all_copy(
    boost::trim_copy(str),
    boost::algorithm::token_finder (boost::is_space(),boost::algorithm::token_compress_on),
    boost::algorithm::const_formatter(" "));
}
Marshall Clow
  • 15,972
  • 2
  • 29
  • 45
  • Tried adding a code block but no luck.. adding an answer, but this is the right track I think. – caktux Apr 08 '14 at 07:42
0

Here is a possible version using regular expressions. My GCC 4.6 doesn't have regex_replace yet, but Boost.Regex can serve as a drop-in replacement:

#include <string>
#include <iostream>
// #include <regex>
#include <boost/regex.hpp>
#include <boost/algorithm/string/trim.hpp>

int main() {
  using namespace std;
  using namespace boost;
  string text = "   a\t   very  \t   ugly   \t\t\t\t   string       ";
  trim(text);
  regex pattern{"[[:space:]]+", regex_constants::egrep};
  string result = regex_replace(text, pattern, " ");
  cout << result << endl;
}
Philipp
  • 48,066
  • 12
  • 84
  • 109