9

In my C++ application I need to remove all dots, commas, exclamation marks and to lower case the string. So far I figured out I can do it with std::erase and std::remove like this:

string content = "Some, NiceEeeE text ! right HeRe .";  

content.erase(std::remove(content.begin(), content.end(), ','), content.end());
content.erase(std::remove(content.begin(), content.end(), '.'), content.end());
content.erase(std::remove(content.begin(), content.end(), '!'), content.end());
std::transform(content.begin(), content.end(), content.begin(), ::tolower);

So my question is can I do this without iterating 4 times throught the string? Are there better ways to do this with simple C++?

Deepsy
  • 3,769
  • 7
  • 39
  • 71

4 Answers4

13

Ignoring iterations performed inside std::remove and erase (which you already do), you can use std::remove_if and provide your own custom predicate:

#include <algorithm>

content.erase(std::remove_if(content.begin(), 
                             content.end(), 
                             [](char c) 
                             { return c==','||c=='.'|| c=='!'; }
              content.end());

Then you can then use std::transform to transform the remaining string to lower case:

#include <cctype>
#include <algorithm>

std::transform(contents.begin(),
               contents.end(),
               contents.begin(),
               [] (unsigned char c) { return std::tolower(c); }));
juanchopanza
  • 223,364
  • 34
  • 402
  • 480
  • You missed `tolower`, which is easy: take `&`, and `c=tolower(c)` – Yakk - Adam Nevraumont Apr 26 '14 at 21:48
  • 3
    @Yakk, Unfortunately, that won't work since you're not allowed to modify items in a `remove_if` call (according to [this](http://en.cppreference.com/w/cpp/algorithm/remove) anyway). @juan, Look at the code in the question. It explains the intent of making the string lowercase better. – chris Apr 26 '14 at 21:49
  • Yes, that's exactly what I need. Just before marking this as the answer, is it possible to add the lowercase method somehow? – Deepsy Apr 26 '14 at 21:55
  • @Deepsy Done (untested, but I hope you get the general idea.) – juanchopanza Apr 26 '14 at 21:55
  • Thanks! That helped me alot. Tho the transform part can be reduced to `std::transform(content.begin(), content.end(), content.begin(), ::tolower);` as in the question. Thanks again. – Deepsy Apr 26 '14 at 21:58
  • 2
    The `transform` part is wrong, for two reasons. First, it takes the argument by reference, so it modifies the input immediately. Second, it has undefined behavior, since it calls the one argument form of `std::tolower` with a `char`. – James Kanze Apr 26 '14 at 22:01
  • @JamesKanze Thanks, I fixed the first mistake. As for the second, as far as I remember the problems could arise if the value of `c` is negative (assuming a signed `char`). But what would be the UB in calling that `std::tolower` with a `char`? (Edit: I have changed the lambda parameter to `unsigned char`.) – juanchopanza Apr 26 '14 at 22:06
  • 2
    @juanchopanza The standard requires a value in the range `[0...UCHAR_MAX]` (or `EOF`). On most implementations, the conversion `char` to `int` can result in values outside of this range. – James Kanze Apr 26 '14 at 22:10
  • @chris I can find no such requirement in the current provisional standard: just `pred(*i)` has to be valid and that it will be evaluated exactly once. If afraid, a naive reimplementation has no issues with changing elements as you remove. – Yakk - Adam Nevraumont Apr 26 '14 at 22:24
  • 1
    @Yakk, I looked and the closest thing I could find was the fact that it could be moved from. I'm not sure if it might say this at a broader scope, though. Actually, § 25.1 [algorithms.general]/4 says *For purposes of determining the existence of data races, algorithms shall not modify objects referenced through an iterator argument unless the specification requires such modification.* Since the `remove_if` spec doesn't require it, I think this would apply. – chris Apr 26 '14 at 22:27
  • 2
    @Yakk and chris: Even more directly: 25.1/8, which specifically puts the restriction on template types named "Predicate". *"The function object pred shall not apply any non-constant function through the dereferenced iterator"* – Benjamin Lindley Apr 26 '14 at 22:31
  • Thanks @benj -- did not know that! Which means if you wamt one pass, implement a naive `remove_if` (or removing for-each) (that moves from the right), note that the restriction on `std::remove_if` is pointless, and bob is your uncle. – Yakk - Adam Nevraumont Apr 26 '14 at 23:34
2

Try this

string result;
for (int loop = 0; loop < content.length(); ++loop) {
     switch (content[loop]) {
        case ',':
        case '!':
        case '.':
            break;
        default:
           result += static_case<unsigned char>(tolower(content[loop]));
     }
}
Ed Heal
  • 59,252
  • 17
  • 87
  • 127
  • 1
    Which doesn't really answer the question, which asked about using standard algorithms. And has undefined behavor, because it calls the one argument form of `tolower` with a `char`. – James Kanze Apr 26 '14 at 22:03
  • 1
    @JamesKanze - Where in the question does it ask to use standard algorithms (i.e. quote "So my question is can I do this without iterating 4 times throught the string?"). BTW - How is `tolower` underfined – Ed Heal Apr 26 '14 at 22:09
  • It seemed implicit from his example code. And according to the standard, `tolower` takes an `int` argument which must be in the range `[0...UCHAR_MAX]`. The implicit conversion of `char` to `int` will result in values which don't meet this constraint, at least on most implementations. – James Kanze Apr 26 '14 at 22:12
  • 1
    @KarolyHorvath Actually, I like it too (except for the undefined behavior, which is easily fixed by casting the `char` to `unsigned char` before calling `tolower`. But it doesn't really seem to be what the original poster is looking for. – James Kanze Apr 26 '14 at 22:13
  • For the record, I like this one too. I'll up-vote once the `tolower` problem is fixed ;-) – juanchopanza Apr 26 '14 at 22:15
  • Whoops - Missed the cast. It is exactly what the poster wanted. It is C++ code. It does not iterate 4 times – Ed Heal Apr 26 '14 at 22:16
1

If you want to do this in a single pass, it's pretty easy to do with a standard for loop. Using standard library routines might be preferred in general, but if you want it done in a single pass and there's not a good fit in the library, then I see no harm in just using a loop.

#include <iostream>
#include <ostream>
#include <string>

using namespace std;

int main()
{
    string exclude_chars(",.!");
    string content = "Some, NiceEeeE text ! right HeRe .";  

    auto write_iter = content.begin();

    for (auto read_iter = content.begin(); read_iter != content.end(); ++read_iter) {
        auto c = *read_iter;

        if (exclude_chars.find(c) != string::npos) continue;

        *write_iter = tolower( (unsigned char) c);
        ++write_iter;
    }

    content.erase(write_iter, content.end());

    cout << content << endl;
}

If you need this functionality in more than one pace and/or need the exclusion characters or transformation to be parameterized, then its also pretty easy to turn that snippet of code into a function that takes those things as argument.

For example, here's a template function that does the filter and transform in one pass:

#include <ctype.h>
#include <iostream>
#include <ostream>
#include <string>

template <class InputIter, class OutputIter, class UnaryOp, class UnaryPred>
OutputIter filter_and_transform(
                    InputIter first, 
                    InputIter last,
                    OutputIter result, 
                    UnaryPred pred,
                    UnaryOp op)
{
    while (first!=last) {
        if (pred(*first)) {
            *result = op(*first);
            ++result;
        }
        ++first;
    }

    return result;
}


int main()
{
    std::string exclude_chars(",.!");
    std::string content = "Some, NiceEeeE text ! right HeRe .";  

    content.erase( 
        filter_and_transform( begin(content), end(content), 
                              begin(content),
                              [](char c) {
                                    return std::string(",.!").find(c) == std::string::npos;
                              },
                              [](char c) -> char {
                                    return tolower((unsigned char) c);
                              }),
        end(content)
     );

    std::cout << content << std::endl;
}

It's more generic, but I'm not convinced it's more readable.


Update (29 Apr 2014)

I decided to play around with the idea of having a custom filter_iterator<> perform the filtering, and when I got frustrated over the amount of boilerplate code I had to get working I figured I'd look into whether Boost had anything similar. Sure enough boost has exactly that data type and a transform_iterator that can be composed together to get the following alternate single pass filter-and-transform operation:

// boost::transform_iterator<> might need the following define
//  in order to work with lambdas (see http://stackoverflow.com/questions/12672372)
#define BOOST_RESULT_OF_USE_DECLTYPE

#include <algorithm>
#include <ctype.h>
#include <iostream>
#include <ostream>
#include <string>

#include "boost/iterator/filter_iterator.hpp"
#include "boost/iterator/transform_iterator.hpp"

/*
    relaxed_copy<>() works like std::copy<>() but is safe to use in 
    situations where result happens to be equivalent to first.

    std::copy<> requires that result not be in the range [first,last) - it's
    understandable that result cannot be in the range [first,last) in general,
    but it should be safe for the specific situation where result == first.
    However, the standard doesn't allow for this particular exception, so 
    relaxed_copy<>() exists to be able to safely handle that scenario.

*/
template <class InputIter, class OutputIter>
OutputIter relaxed_copy(
                InputIter first, 
                InputIter last,
                OutputIter result)
{
    while (first!=last) {
        *result = *first;
        ++first;
        ++result;
    }

    return result;
}


int main()
{
    std::string exclude_chars(",.!");
    std::string content = "Some, NiceEeeE text ! right HeRe .";  

    // set up filter_iterators over the string to filter out ",.!" characters
    auto filtered_first = 
        boost::make_filter_iterator(
            [&exclude_chars](char c) {
                return exclude_chars.find(c) == std::string::npos;
            },
            begin(content),
            end(content)
        );

    auto filtered_last = 
        boost::make_filter_iterator( 
            filtered_first.predicate(), 
            end(content)
        );

    // set up transform_iterators 'on top of' the filter_iterators
    //  to transform the filtered characters to lower case
    auto trans_first = 
        boost::make_transform_iterator( 
            filtered_first, 
            [](char c) -> char {
                return tolower((unsigned char) c);
            }
        );

    auto trans_last  = 
        boost::make_transform_iterator( 
            filtered_last, 
            trans_first.functor()
        );

     // now copy using the composed iterators and erase any leftovers
     content.erase( 
        relaxed_copy( trans_first, trans_last, begin(content)),
        end(content)
     );


    std::cout << content << std::endl;
}

I think this is a pretty nifty technique, but I still think it might be hard to argue that it's understandable at a glance what's going on.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
1

This sounds like a conditional std::transform so you could do:

template <typename InIt, typename OutIt, typename UnOp, typename Pred>
OutIt transform_if(InIt first, InIt last, OutIt dest, UnOp op, Pred pr)
{
    while (first != last) {
        if (pr(*first)) {
            *dest = op(*first);
            ++dest;
        }
        ++first;
    }
    return dest;
}

Usage in this case would be:

content.erase(transform_if(
    content.begin(), content.end(),
    content.begin(),
    [](char c){ return std::tolower(c, std::locale()); },
    [](char c){ return !(c == ',' || c == '.'); }
), content.end());
Blastfurnace
  • 18,411
  • 56
  • 55
  • 70