how to test a string for letters only

Question

how could I test a string against only valid characters like letters a-z?...

string name;

cout << "Enter your name"
cin >> name;

string letters = "qwertyuiopasdfghjklzxcvbnm";

string::iterator it;

for(it = name.begin(); it = name.end(); it++)
{
  size_t found = letters.find(it);
}

What's your question? Is your code wrong, or is something missing from it? In which case, what? — JB., Sep 30 '11 at 22:58
Just pointing out that with ascii text you only have to check 2 ranges per character - (c >= 0x61 && c <= 0x71). Some of these solutions don't take advantage of that fact. — Pubby, Sep 30 '11 at 23:33

Jon Purdy · Answer 1 · 2011-10-01T16:00:03.497

First, using std::cin >> name will fail if the user enters John Smith because >> splits input on whitespace characters. You should use std::getline() to get the name:

std::getline(std::cin, name);

Here we go…

There are a number of ways to check that a string contains only alphabetic characters. The simplest is probably s.find_first_not_of(t), which returns the index of the first character in s that is not in t:

bool contains_non_alpha
    = name.find_first_not_of("abcdefghijklmnopqrstuvwxyz") != std::string::npos;

That rapidly becomes cumbersome, however. To also match uppercase alphabetic characters, you’d have to add 26 more characters to that string! Instead, you may want to use a combination of find_if from the <algorithm> header and std::isalpha from <cctype>:

#include <algorithm>
#include <cctype>

struct non_alpha {
    bool operator()(char c) {
        return !std::isalpha(c);
    }
};

bool contains_non_alpha
    = std::find_if(name.begin(), name.end(), non_alpha()) != name.end();

find_if searches a range for a value that matches a predicate, in this case a functor non_alpha that returns whether its argument is a non-alphabetic character. If find_if(name.begin(), name.end(), ...) returns name.end(), then no match was found.

But there’s more!

To do this as a one-liner, you can use the adaptors from the <functional> header:

#include <algorithm>
#include <cctype>
#include <functional>

bool contains_non_alpha
    = std::find_if(name.begin(), name.end(),
                   std::not1(std::ptr_fun((int(*)(int))std::isalpha))) != name.end();

The std::not1 produces a function object that returns the logical inverse of its input; by supplying a pointer to a function with std::ptr_fun(...), we can tell std::not1 to produce the logical inverse of std::isalpha. The cast (int(*)(int)) is there to select the overload of std::isalpha which takes an int (treated as a character) and returns an int (treated as a Boolean).

Or, if you can use a C++11 compiler, using a lambda cleans this up a lot:

#include <cctype>

bool contains_non_alpha
    = std::find_if(name.begin(), name.end(),
                   [](char c) { return !std::isalpha(c); }) != name.end();

[](char c) -> bool { ... } denotes a function that accepts a character and returns a bool. In our case we can omit the -> bool return type because the function body consists of only a return statement. This works just the same as the previous examples, except that the function object can be specified much more succinctly.

And (almost) finally…

In C++11 you can also use a regular expression to perform the match:

#include <regex>

bool contains_non_alpha
    = !std::regex_match(name, std::regex("^[A-Za-z]+$"));

But of course…

None of these solutions addresses the issue of locale or character encoding! For a locale-independent version of isalpha(), you’d need to use the C++ header <locale>:

#include <locale>

bool isalpha(char c) {
    std::locale locale; // Default locale.
    return std::use_facet<std::ctype<char> >(locale).is(std::ctype<char>::alpha, c);
}

Ideally we would use char32_t, but ctype doesn’t seem to be able to classify it, so we’re stuck with char. Lucky for us we can dance around the issue of locale entirely, because you’re probably only interested in English letters. There’s a handy header-only library called UTF8-CPP that will let us do what we need to do in a more encoding-safe way. First we define our version of isalpha() that uses UTF-32 code points:

bool isalpha(uint32_t c) {
    return (c >= 0x0041 && c <= 0x005A)
        || (c >= 0x0061 && c <= 0x007A);
}

Then we can use the utf8::iterator adaptor to adapt the basic_string::iterator from octets into UTF-32 code points:

#include <utf8.h>

bool contains_non_alpha
    = std::find_if(utf8::iterator(name.begin(), name.begin(), name.end()),
                   utf8::iterator(name.end(), name.begin(), name.end()),
                   [](uint32_t c) { return !isalpha(c); }) != name.end();

For slightly better performance at the cost of safety, you can use utf8::unchecked::iterator:

#include <utf8.h>

bool contains_non_alpha
    = std::find_if(utf8::unchecked::iterator(name.begin()),
                   utf8::unchecked::iterator(name.end()),
                   [](uint32_t c) { return !isalpha(c); }) != name.end();

This will fail on some invalid input.

Using UTF8-CPP in this way assumes that the host encoding is UTF-8, or a compatible encoding such as ASCII. In theory this is still an imperfect solution, but in practice it will work on the vast majority of platforms.

I hope this answer is finally complete!

+1 I'm not sure if this approach would work in an i18n environment with non-english unicode in wchars, etc, but great response. :) — Alex Reece, Sep 30 '11 at 23:29
name.find_first_not_of returns position or npos (-1). This means the result, when casted to bool will always be true. You need to test against npos for this to make sense. — flumpb, Sep 30 '11 at 23:34
@kisplit: Thanks for catching that. Just an oversight on my part. — Jon Purdy, Oct 01 '11 at 00:21
@AlexReece: You could use the `utf8::iterator` adaptor from [UTF8-CPP](http://utfcpp.sourceforge.net/) to adapt the `basic_string::iterator` to return code points rather than raw octets, but you’d need an overload of `isalpha` that supports `uint32_t` or `char32_t`. I dunno if C++11 adds anything to `` in that regard. — Jon Purdy, Oct 01 '11 at 00:28
Re: Unicode You would consider that COMBINING diacritics are letters, right? — curiousguy, Oct 01 '11 at 02:24
@curiousguy: By themselves? No. After a code point with which they can legally combine? Sure. But then you’re getting into territory that would be best covered by ICU, and this is already an absurdly detailed answer to a simple question. — Jon Purdy, Oct 01 '11 at 02:33
Then you are not treating NFC and NFD the same? What a shame! Great answer anyway. — curiousguy, Oct 01 '11 at 03:01
You can even replace `std::find_if() != end` by `std::any_of` — Jarod42, Apr 19 '16 at 21:00
this should have been the accepted answer, it covers many different solutions, and also shows you how much C++ has evolved over the years. — A. Smoliak, Sep 24 '18 at 06:11

score 7 · Accepted Answer · answered Sep 30 '11 at 23:06

7

STL way:

struct TestFunctor
{
  bool stringIsCorrect;
  TestFunctor()
  :stringIsCorrect(true)
  {}

  void operator() (char ch)
  {
    if(stringIsCorrect && !((ch <= 'z' && ch >= 'a') || (ch <= 'Z' && ch >= 'A')))
      stringIsCorrect = false;
  }
}

TestFunctor functor;

for_each(name.begin(), name.end(), functor);

if(functor.stringIsCorrect)
  cout << "Yay";

answered Sep 30 '11 at 23:06

GreenScape

7,191
2
34
64

how about if I wanted to include a space as part of the test parameter? how would you do it? – miatech Oct 01 '11 at 19:50
2

replace `if(stringIsCorrect && !((ch <= 'z' && ch >= 'a') || (ch <= 'Z' && ch >= 'A')))` with `if(stringIsCorrect && !((ch <= 'z' && ch >= 'a') || (ch <= 'Z' && ch >= 'A') || (ch == ' ')))` – GreenScape Oct 02 '11 at 01:10
1

Note that encoding as [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC) has characters between `'a'`and `'z'` which are not letter. – Jarod42 Apr 19 '16 at 20:57

Lev · Answer 3 · 2011-10-01T09:11:19.540

5

If you use Boost, you can use boost::algorithm::is_alpha predicate to perform this check. Here is how to use it:

const char* text = "hello world";
bool isAlpha = all( text1, is_alpha() );

Update: As the documentation states, "all() checks all elements of a container to satisfy a condition specified by a predicate". The call to all() is needed here, since is_alpha() actually operates on characters.

Hope, I helped.

edited Oct 01 '11 at 09:11

answered Oct 01 '11 at 00:17

Lev

727
5
17

You probably meant `const char *test` – curiousguy Oct 01 '11 at 02:25
1

You may want to show what you call fully qualified `all` and `is_alpha` by themselves won't work unless you have the correct `use namespace` which is generally not recommended. Also, what is `text1`? – Alexis Wilke Jun 19 '15 at 22:42

score 3 · Answer 4 · answered May 17 '19 at 08:47

3

C++11 approach using std::all_of:

std::all_of(std::begin(name), std::end(name),
    [](char c){ return std::isalpha(c); });

std::all_of will only return true if all of the elements are true according to the supplied predicate function.

answered May 17 '19 at 08:47

Galik

47,303
4
80
117

score 2 · Answer 5 · answered Sep 30 '11 at 23:17

I would suggest investigating the ctype library: http://www.cplusplus.com/reference/std/locale/ctype/

For example, the function is (see ctype.is) is a way to check properties on letters in locale sensitive manner:

#include <locale>
using namespace std;
bool is_alpha(char c) {
    locale loc;
    bool upper = use_facet< ctype<char> >(loc).is( ctype<char>::alpha, quote[0]);
    return upper;
}

score 2 · Answer 6 · answered Sep 30 '11 at 23:40

2

  for (string::iterator it=name.begin(); it!=name.end(); ++it)
  {
    if ((*it) < 0x61 || (*it) > 0x71) 
      // string contains characters other than a-z
  }

answered Sep 30 '11 at 23:40

Pubby

51,882
13
139
180

the hex count is wrong. please update it with correct HEX codes ( refer asciitable.com ). Also, you can case for 'A'-'Z' as well. – CyprUS Oct 27 '15 at 01:01
1

Avoid magic number ans use `'A'` and `'Z'`, but note that this condition is false for system as [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC). – Jarod42 Apr 19 '16 at 21:02

how to test a string for letters only

6 Answers6

Here we go…

But there’s more!

And (almost) finally…

But of course…

Linked

Related