Is there a way to use the erase-remove idiom in concert with other looping conditions?

Question

I have a function that, given a block of text, should remove all punctuation characters, and make all letters lowercase, and eventually, should shift them according to a mono alphabetic cipher. The code below works:

class Cipher { 
  public:

  string keyword; 

  string decipheredText;

  deque<string> encipheredAlphabet;

    static bool is_punctuation (char c) {
      return c ==  '.' || c == ',' || c == '!' || c == '\''|| c == '?' || c 
      == ' ';
    }

  string encipher(string text) { 
    Alphabet a;
    encipheredAlphabet = a.cipherLetters(keyword);


    text.erase( remove_if(text.begin(), text.end(), is_punctuation), 
    text.end() );

    string::iterator it;
    for (it = text.begin(); it != text.end(); it++) { 
      *it = tolower(*it); 
      // encipher text according to shift
    }

    return text;
  }

};

The problem is, it currently makes two passes over the string, one to remove the punctuation, and one to do all the other stuff. This seems inefficient, since it seems like all the transformations could be accomplished in one pass through the string somehow. Is there a clean way to incorporate the erase-remove idiom with other loop conditions?

Sure. Just write your own algorithm: `remove_if_tolowercase_otherwise`. Look at how `std::remove_if` is implemented and tweak it. — Pete Becker, Dec 28 '18 at 21:22
*This seems inefficient* -- Profile the code. Going over a container `n` times doing one operation per trip, or performing `n` operations in one trip, the complexity is theoretically the same. — PaulMcKenzie, Dec 28 '18 at 21:24
@PaulMcKenzie in addition to being *theoretically* the same - the complexity *is* the same. However, that doesn't say anything about how efficient it is. — xaxxon, Dec 28 '18 at 21:30
Yes, you're right. What I meant that the complexity is the same, but the real world efficiency may not be. But that's why the code should be profiled before trying to make the loop "too elegant" for one to understand what it's doing. — PaulMcKenzie, Dec 28 '18 at 21:32
Code readability is important, and two loops are quite readable since they do two distinct tasks. You can replace that `for` loop with [std::transform](https://stackoverflow.com/questions/313970/how-to-convert-stdstring-to-lower-case). — 1201ProgramAlarm, Dec 28 '18 at 21:35
The second pass would look less redundant and prettier as `for(auto &c : text)`. — Davis Herring, Dec 28 '18 at 21:39
It seems the consensus is that doing two loops is not necessarily less efficient, so I guess it makes sense to implement the whole function, then see what it looks like before changing the structure. Thanks everyone! — adam tropp, Dec 28 '18 at 22:18

score 2 · Answer 1 · answered Dec 28 '18 at 22:45

2

With range-v3, you might create (lazy) view:

return text | ranges::view::filter([](char c){ return !is_punctuation(c); })
            | ranges::view::transform([](char c) -> char { return to_lower(c); });

answered Dec 28 '18 at 22:45

Jarod42

203,559
14
181
302

score 1 · Answer 2 · answered Dec 28 '18 at 22:08

1

You could do it by using std::accumulate and an iterator as init value that insert into an output std::string

auto filter = [](auto pred) {
    return [=](auto map) {
        auto accumulator = [=](auto it, auto c) {
            if (pred(c)) {
                *it = map(c);
            }
            return ++it;
        };
        return accumulator;
    };
};

auto accumulator = filter(std::not_fn(is_punctuation))
([](auto c) {
    return std::tolower(c);
});

std::string in = "insIsjs.|s!js";
std::string out;
std::accumulate(std::begin(in), std::end(in), std::back_inserter(out), accumulator);

See demo

answered Dec 28 '18 at 22:08

Jans

11,064
3
37
45

1

I can't say that I understand this solution. The triply nested lambda with opaque names makes it hard for me to understand what this is doing. It's also peculiar to use `std::accumulate` to do something which really doesn't feel much like accumulating something, especially since you're passing an iterator in a position which feels invalid for `accumulate`. Rather than going through this, I'd strongly prefer some sort of `transform_if` or `transform_remove_if` – Justin Dec 28 '18 at 22:24
@Justin that code model `transform_if` and the fact that the init value is an iterator don't remove the *fact* that is an accumulation, just than in this case, you're accumulating into an *iterator*, but I'll unwrap the lambdas so it can be more legible – Jans Dec 28 '18 at 22:29
Exactly. It does do a `transform_if`, but in a confusing, roundabout way. The fact that the init value is an iterator is confusing because it looks like a mistake in thinking that `accumulate` has an output iterator parameter. – Justin Dec 28 '18 at 22:31
There are a lot of `std` algorithm that receive an output iterator and some follows the same idiom as above, here what is needed is a different way of thinking about the init value of accumulation, you might find some data stream libraries that also treat collections as init values for accumulation. – Jans Dec 28 '18 at 22:36

Sid S · Answer 3 · 2018-12-28T23:12:44.677

1

Copy and/or modify characters, then truncate the string :

string encipher(string text)
{
    auto it = text.begin(),
         jt = it;
    for (; it != text.end(); it++)
    {
        if (!is_punctuation(*it))
        {
            *jt = tolower(*it);
            ++jt;
        }
    }
    text.erase(jt, it);
    return text;
}

edited Dec 28 '18 at 23:12

answered Dec 28 '18 at 22:33

Sid S

6,037
2
18
24

Justin · Accepted Answer · 2019-01-03T17:36:46.733

If you don't want to do two loops because you've measured and found that it's slower, write a custom algorithm:

template <typename Iter, typename OutIter>
OutIter lowercased_without_punctuation(Iter begin, Iter end, OutIter out) {
    while (begin != end) {
        // Ignoring things like std::move_iterator for brevity.
        if (!is_punctuation(*begin)) {
            *out = tolower(*begin);
            ++out;
        }

        // Use `++iter` rather than `iter++` when possible
        ++begin;
    }

    return out;
}

// ...

string encipher(string text) {
    Alphabet a;
    encipheredAlphabet = a.cipherLetters(keyword);

    text.erase(
        lowercased_without_punctuation(text.begin(), text.end(), text.begin()),
        text.end());

    return text;
}

If you think about it some more, lowercased_without_punctuation is actually a special-case of a more general algorithm which might be called transform_if (relevant Q&A):

template <typename Iter, typename OutIter, typename Pred, typename Transf>
OutIter transform_if(Iter begin, Iter end, OutIter out, Pred p, Transf t) {
    while (begin != end) {
        if (p(*begin)) {
            *out = t(*begin);
            ++out;
        }

        ++begin;
    }

    return out;
}

// ...

string encipher(string text) {
    Alphabet a;
    encipheredAlphabet = a.cipherLetters(keyword);

    text.erase(
        transform_if(text.begin(), text.end(), text.begin(),
            [](char c) { return !is_punctuation(c); },
            [](char c) { return tolower(c); }),
        text.end());

    return text;
}

There's a reason why we teach erase-remove instead of erase-in-a-loop. Every `erase` in your code needs to shift the entire remainder of the string. — T.C., Dec 28 '18 at 22:39
@T.C. — exactly. This code has only one **explicit** loop, but its complexity is O(n^2), while the original version was O(n). — Pete Becker, Dec 29 '18 at 01:36
@T.C. I was too busy thinking about correctness, that I entirely forgot time complexity. If I could delete or downvote this answer, I would — Justin, Dec 29 '18 at 06:28
@Justin: You can fix your answer by using "input" and "output" iterator, so you won't call erase, but just skip punctuation. — Jarod42, Dec 29 '18 at 20:01

Is there a way to use the erase-remove idiom in concert with other looping conditions?

4 Answers4