3

I have a function that, given a block of text, should remove all punctuation characters, and make all letters lowercase, and eventually, should shift them according to a mono alphabetic cipher. The code below works:

class Cipher { 
  public:

  string keyword; 

  string decipheredText;

  deque<string> encipheredAlphabet;

    static bool is_punctuation (char c) {
      return c ==  '.' || c == ',' || c == '!' || c == '\''|| c == '?' || c 
      == ' ';
    }

  string encipher(string text) { 
    Alphabet a;
    encipheredAlphabet = a.cipherLetters(keyword);


    text.erase( remove_if(text.begin(), text.end(), is_punctuation), 
    text.end() );

    string::iterator it;
    for (it = text.begin(); it != text.end(); it++) { 
      *it = tolower(*it); 
      // encipher text according to shift
    }

    return text;
  }

};

The problem is, it currently makes two passes over the string, one to remove the punctuation, and one to do all the other stuff. This seems inefficient, since it seems like all the transformations could be accomplished in one pass through the string somehow. Is there a clean way to incorporate the erase-remove idiom with other loop conditions?

adam tropp
  • 674
  • 7
  • 22
  • 2
    Sure. Just write your own algorithm: `remove_if_tolowercase_otherwise`. Look at how `std::remove_if` is implemented and tweak it. – Pete Becker Dec 28 '18 at 21:22
  • *This seems inefficient* -- Profile the code. Going over a container `n` times doing one operation per trip, or performing `n` operations in one trip, the complexity is theoretically the same. – PaulMcKenzie Dec 28 '18 at 21:24
  • @PaulMcKenzie in addition to being *theoretically* the same - the complexity *is* the same. However, that doesn't say anything about how efficient it is. – xaxxon Dec 28 '18 at 21:30
  • 1
    Yes, you're right. What I meant that the complexity is the same, but the real world efficiency may not be. But that's why the code should be profiled before trying to make the loop "too elegant" for one to understand what it's doing. – PaulMcKenzie Dec 28 '18 at 21:32
  • 2
    Code readability is important, and two loops are quite readable since they do two distinct tasks. You can replace that `for` loop with [std::transform](https://stackoverflow.com/questions/313970/how-to-convert-stdstring-to-lower-case). – 1201ProgramAlarm Dec 28 '18 at 21:35
  • 1
    The second pass would look less redundant and prettier as `for(auto &c : text)`. – Davis Herring Dec 28 '18 at 21:39
  • It seems the consensus is that doing two loops is not necessarily less efficient, so I guess it makes sense to implement the whole function, then see what it looks like before changing the structure. Thanks everyone! – adam tropp Dec 28 '18 at 22:18

4 Answers4

2

With range-v3, you might create (lazy) view:

return text | ranges::view::filter([](char c){ return !is_punctuation(c); })
            | ranges::view::transform([](char c) -> char { return to_lower(c); });
Jarod42
  • 203,559
  • 14
  • 181
  • 302
1

You could do it by using std::accumulate and an iterator as init value that insert into an output std::string

auto filter = [](auto pred) {
    return [=](auto map) {
        auto accumulator = [=](auto it, auto c) {
            if (pred(c)) {
                *it = map(c);
            }
            return ++it;
        };
        return accumulator;
    };
};

auto accumulator = filter(std::not_fn(is_punctuation))
([](auto c) {
    return std::tolower(c);
});

std::string in = "insIsjs.|s!js";
std::string out;
std::accumulate(std::begin(in), std::end(in), std::back_inserter(out), accumulator);

See demo

Jans
  • 11,064
  • 3
  • 37
  • 45
  • 1
    I can't say that I understand this solution. The triply nested lambda with opaque names makes it hard for me to understand what this is doing. It's also peculiar to use `std::accumulate` to do something which really doesn't feel much like accumulating something, especially since you're passing an iterator in a position which feels invalid for `accumulate`. Rather than going through this, I'd strongly prefer some sort of `transform_if` or `transform_remove_if` – Justin Dec 28 '18 at 22:24
  • @Justin that code model `transform_if` and the fact that the init value is an iterator don't remove the *fact* that is an accumulation, just than in this case, you're accumulating into an *iterator*, but I'll unwrap the lambdas so it can be more legible – Jans Dec 28 '18 at 22:29
  • Exactly. It does do a `transform_if`, but in a confusing, roundabout way. The fact that the init value is an iterator is confusing because it looks like a mistake in thinking that `accumulate` has an output iterator parameter. – Justin Dec 28 '18 at 22:31
  • There are a lot of `std` algorithm that receive an output iterator and some follows the same idiom as above, here what is needed is a different way of thinking about the init value of accumulation, you might find some data stream libraries that also treat collections as init values for accumulation. – Jans Dec 28 '18 at 22:36
1

Copy and/or modify characters, then truncate the string :

string encipher(string text)
{
    auto it = text.begin(),
         jt = it;
    for (; it != text.end(); it++)
    {
        if (!is_punctuation(*it))
        {
            *jt = tolower(*it);
            ++jt;
        }
    }
    text.erase(jt, it);
    return text;
}
Sid S
  • 6,037
  • 2
  • 18
  • 24
0

If you don't want to do two loops because you've measured and found that it's slower, write a custom algorithm:

template <typename Iter, typename OutIter>
OutIter lowercased_without_punctuation(Iter begin, Iter end, OutIter out) {
    while (begin != end) {
        // Ignoring things like std::move_iterator for brevity.
        if (!is_punctuation(*begin)) {
            *out = tolower(*begin);
            ++out;
        }

        // Use `++iter` rather than `iter++` when possible
        ++begin;
    }

    return out;
}

// ...

string encipher(string text) {
    Alphabet a;
    encipheredAlphabet = a.cipherLetters(keyword);

    text.erase(
        lowercased_without_punctuation(text.begin(), text.end(), text.begin()),
        text.end());

    return text;
}

If you think about it some more, lowercased_without_punctuation is actually a special-case of a more general algorithm which might be called transform_if (relevant Q&A):

template <typename Iter, typename OutIter, typename Pred, typename Transf>
OutIter transform_if(Iter begin, Iter end, OutIter out, Pred p, Transf t) {
    while (begin != end) {
        if (p(*begin)) {
            *out = t(*begin);
            ++out;
        }

        ++begin;
    }

    return out;
}

// ...

string encipher(string text) {
    Alphabet a;
    encipheredAlphabet = a.cipherLetters(keyword);

    text.erase(
        transform_if(text.begin(), text.end(), text.begin(),
            [](char c) { return !is_punctuation(c); },
            [](char c) { return tolower(c); }),
        text.end());

    return text;
}
Justin
  • 24,288
  • 12
  • 92
  • 142
  • Thanks, this looks pretty easy to understand. – adam tropp Dec 28 '18 at 22:17
  • 4
    There's a reason why we teach erase-remove instead of erase-in-a-loop. Every `erase` in your code needs to shift the entire remainder of the string. – T.C. Dec 28 '18 at 22:39
  • 3
    @T.C. — exactly. This code has only one **explicit** loop, but its complexity is O(n^2), while the original version was O(n). – Pete Becker Dec 29 '18 at 01:36
  • @T.C. I was too busy thinking about correctness, that I entirely forgot time complexity. If I could delete or downvote this answer, I would – Justin Dec 29 '18 at 06:28
  • 1
    @Justin: You can fix your answer by using "input" and "output" iterator, so you won't call erase, but just skip punctuation. – Jarod42 Dec 29 '18 at 20:01