3

I'm trying to remove all characters and spaces except letters. But the "erase spaces" part doesn't take effect, it will only take effect if I comment out the remove characters part.

for (int i = 0; i < s.size(); i++)
    {
        if (!(s[i] >= 'a' && s[i] <= 'z' || s[i] >= 'A' && s[i] <= 'Z'))
        {
            s[i] = '\0';
        }

    }

s.erase(remove(s.begin(), s.end(), ' '), s.end());
EylM
  • 5,967
  • 2
  • 16
  • 28
  • 2
    `\0` is null, not a space. –  Aug 03 '19 at 18:13
  • '\0' is not what u want, you can use '' or ' ' depending on what you want. take a look at this for '\0': https://stackoverflow.com/a/4711475/4022530 – Majid khalili Aug 03 '19 at 18:18
  • 1
    @Majidkhalili Why is '\0' "not what u want"? Your link is about string literals & C strings, something completely different. – Lightness Races in Orbit Aug 03 '19 at 18:19
  • @LightnessRacesinOrbit as far as I understood he wants to remove the characters that are not letters, not replacing them with null or anything else, the link is just to say what '\0' is – Majid khalili Aug 03 '19 at 18:38
  • 1
    @Majidkhalili It looks like you did not fully understand the program. First the OP replaces unwanted characters with a "sentinel" or "placeholder" value (and has chosen `\0` for this, apparently already understanding what that is, and understanding that `std::string` is null-safe). Then, they efficiently remove the placeholder characters using the erase-remove idiom. You claimed _"`\0` is not what u want"_ but there does not seem to be any basis for this claim. Aside from the typo (and I'm not sure about those `&&`/`||` offhand), the code in the question is fine. – Lightness Races in Orbit Aug 03 '19 at 18:44
  • 1
    I would use `remove_if` instead of `remove`. Then this can be accomplished in one step instead of two. – john Aug 03 '19 at 19:15
  • @LightnessRacesinOrbit Sure, u r right, my mistake – Majid khalili Aug 03 '19 at 19:29

3 Answers3

3

You're replacing all the non-alphabetic characters with NULs, then removing all the spaces. Since NULs are not spaces, this latter step does nothing. If you change the assignment in the loop to

s[i] = ' ';

you would instead replace them with spaces, which would then be removed by the eraser(remove

If you want to make the code more readable, you could replace the complex if with

if (!isalpha(s[i]))

or you could even replace the whole thing with

s.erase(remove_if(s.begin(), s.end(), [](char ch){ return !isalpha(ch); });
Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • 1
    But now your program is harder to debug, because there is no distinction between "a previously-valid character that I failed to replace for some reason" and "a character that I failed to remove for some reason". A `\0` is honestly a good choice; it just needs to be used consistently. – Lightness Races in Orbit Aug 03 '19 at 18:22
  • Spaces are easier to see in a debugger output than NULs, usually. `'\0'` has the problem that it may be invisible if output. IMO any printable character would be easier to debug than a NUL... – Chris Dodd Aug 03 '19 at 18:28
  • Any debugger that renders nulls as "invisible" should be shot. Have you encountered one that does? – Lightness Races in Orbit Aug 03 '19 at 18:32
  • Besides, you may be debugging by simply printing the numeric value of each character in the string, which will tell you immediately exactly what is going on in the event of a bug. – Lightness Races in Orbit Aug 03 '19 at 18:34
  • That's the case that is most difficult -- if you simply output the string (to cout) the NULs will be invisible. – Chris Dodd Aug 03 '19 at 18:34
  • Why would you emit the actual characters if you're debugging? You should emit their numeric values... – Lightness Races in Orbit Aug 03 '19 at 18:34
  • That's a lot more work than a simple `cout << s << flush;` added to the code. – Chris Dodd Aug 03 '19 at 18:36
  • It's absolutely trivial (`for (const auto ch : s) cerr << hex << +ch << ' '; cerr << '\n';`) and you should already have a function to do this (ideally one that also emits the printables alongside, [ala `hexdump`](https://gist.github.com/tomalakgeretkal/594da2d34c51c3642a6f844333ae9e52)) in your toolkit. Either way, you're arguing to do something in an ambiguous way because you have inadequate diagnostic tools, which is a backwards way to program. – Lightness Races in Orbit Aug 03 '19 at 18:37
1

So you replaced the characters you don't want with '\0'.

Then you removed all ' ' characters.

That last stage presumably should involve '\0'

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • Except wouldn't a null prematurely end the string, or is `std::string` smarter than that? –  Aug 03 '19 at 18:15
  • @Chipster `std::string` is perfectly null-safe. – Lightness Races in Orbit Aug 03 '19 at 18:18
  • 1
    @Chipster A `std::string` knows its size and can hold embedded nulls. It's *not* a C-style null terminated string. – Jesper Juhl Aug 03 '19 at 18:19
  • 1
    A null is the most appropriate character to use here anyway, compared to some alternative like a space. Not only is it semantically reasonable, but it'll then be easier to distinguish "characters I forgot to replace" with "characters I forgot to remove" when you're debugging a problem. – Lightness Races in Orbit Aug 03 '19 at 18:21
  • @LightnessRacesinOrbit A null wouldn't be the best here if it terminated at null like a c string would, since it'd become impossible to know the actual length from just the pointer. You couldn't really remove NULL then. But, since you guys have cleared up that that is not the case, then yes, NULL is the best here. +1, btw. –  Aug 03 '19 at 18:26
0

For the benefit of future readers: in C++20, we have unified erasure, so we can simply use

std::erase_if(s, [](unsigned char c) { return !std::isalpha(ch); });

(See Do I need to cast to unsigned char before calling toupper(), tolower(), et al.? for why unsigned char should be used)

L. F.
  • 19,445
  • 8
  • 48
  • 82