1

I'm having some issues with a part of a C++ program I'm writing to encode and decode a Playfair cipher. The part of the program I'm having issues with is the first step (unfortunate, I know), which involves removing all whitespace, punctuation, and non-alphabet characters. This is what I have:

std::string step1(std::string plaintext)
{
    plaintext.erase(remove_if(plaintext.begin(), plaintext.end(),
        std::not1(std::ptr_fun(std::isalpha))));
    plaintext.erase(remove_if(plaintext.begin(), plaintext.end(),
        std::ptr_fun(std::ispunct)));
    return plaintext;
}

The output is good, except it adds characters at the end of the string once cleaned. Usually, the extra characters are copies of characters found at the end of the cleaned input. I can't, for the life of me, figure out why this is happening. Any ideas?

Drift476
  • 15
  • 3

2 Answers2

3

std::remove/_if() does not actually remove anything, it just moves matching items to the end of the container, and then returns an iterator to the beginning of that range of items. The caller can then use that iterator to actually remove the items from the container.

You are passing that iterator to the overload of std::string::erase() that takes 1 iterator as input. Thus, it will erase only 1 character at most (if std::remove_if() did not find anything, it would return the string's end() iterator, and calling erase() on that iterator is undefined behavior). If more than 1 character were "removed" to the end of the string, the remaining un-erased characters would be left behind. That is what you are seeing in your output.

To erase all of the "removed" characters, you need to use the overload of std::string::erase() that takes 2 iterators denoting a range, eg:

std::string step1(std::string plaintext)
{
    plaintext.erase(
        std::remove_if(plaintext.begin(), plaintext.end(),
            std::not1(std::ptr_fun(std::isalpha))
        ),
        plaintext.end()
    );
    plaintext.erase(
        std::remove_if(plaintext.begin(), plaintext.end(),
            std::ptr_fun(std::ispunct)
        ),
        plaintext.end()
    );
    return plaintext;
}

Note that removing all non-alpha characters using std:::isalpha() will include all whitespace and punctuation characters, thus your 2nd search using std::ispunct() will not find anything to remove, so you can just drop that search entirely, eg:

std::string step1(std::string plaintext)
{
    plaintext.erase(
        std::remove_if(plaintext.begin(), plaintext.end(),
            std::not1(std::ptr_fun(std::isalpha))
        ),
        plaintext.end()
    );
    return plaintext;
}

That being said, as noted in comments, std::not1() and std::ptr_fun are deprecated in modern C++. In C++11 and later, you can use a lambda instead:

std::string step1(std::string plaintext)
{
    plaintext.erase(
        std::remove_if(plaintext.begin(), plaintext.end(),
            [](unsigned char ch){ return !std::isalpha(ch); }
        ),
        plaintext.end()
    );
    return plaintext;
}

In C++20 and later, you can use std::erase_if():

std::string step1(std::string plaintext)
{
    std::erase_if(plaintext,
        [](unsigned char ch){ return !std::isalpha(ch); }
    );
    return plaintext;
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • 4
    Although the question is tagged C++11, it may be worth noting a few things. Firstly, `std::ptr_fun` was already deprecated in C++11, and was removed in C++17. Secondly, `std::not1` was deprecated in C++17 and is removed in C++20. And finally, since C++20, taking an address of a standard library function, such as `std::isalpha`, leads to unspecified behavior and may fail to compile (the only exceptions are I/O-manipulators specifically marked as _addressable functions_). For rationale, see [this answer](https://stackoverflow.com/a/62252164/1458097). So, in new code, it's better to use a lambda. – heap underrun Oct 14 '21 at 00:31
0

Maybe it is too obvious. But there is a dedicated function for such a task. It is called: std::regex_replace.

This is very versatile and easy to use.

With a regex, you can define easily, what you want to replace or not.

Please see:

#include <iostream>
#include <string>
#include <regex>

int main() {

    std::string test{"  ! Hello World 1234 ,;.++"};

    std::cout << std::regex_replace(test, std::regex(R"([^a-zA-Z]+)"), "");
}

A M
  • 14,694
  • 5
  • 19
  • 44