Some answers are already given. Anyway. I would like to show to you the background and some alternative solutions.
Let us first check your attempt. You were writing:
if (text.at(i) == 'a' || 'e' || 'i' || 'o' || 'u' || 'y')
And hoping that the letter at index I would be compared to all mentioned letters. But this will not happen. Why?
An if
-statement expects a Boolean expression. This expression will be evaluated. If the result is equal to true, then the statement or compound statement after the if
is executed. OK, then let us look at the expression that must be evaluated:
text.at(i) == 'a' || 'e' || 'i' || 'o' || 'u' || 'y'
consists of operands and operators. Operators have properties. One of them is the precedence. Some operators have higher precedence then others. This will result in “2+3*4” = 14 and not =10. “*” has an higher precedence as “+”. So, first the multiplication is done and then the addition. You could write also “2+ (3*4)”. That is completely the same.
Next property is the associativity. If the precedence is equal, then the associativity will decide, in what sequence an expression will be evaluated. The associativity is “left” or “right”. And this means, the operand is evaluated from left to right or from right to left.
In the CPP reference, here, there is a very good description. Please read this. And if we analyze this then we will see that the comparison operator ==
has a higher precedence than the logical or
. So, we can already rewrite the expression as:
(text.at(i) == 'a') || 'e' || 'i' || 'o' || 'u' || 'y'
The following logical or
operators have the same precedence. Then, the associativity will decide, in which sequence the are evaluated: From left to write. So, we can rewrite the expression now to
((text.at(i) == 'a') || 'e') || 'i' || 'o' || 'u' || 'y' // then
(((text.at(i) == 'a') || 'e') || 'i' ) || 'o' || 'u' || 'y' // then
((((text.at(i) == 'a') || 'e') || 'i' ) || 'o' ) || 'u' || 'y' // then finally
(((((text.at(i) == 'a') || 'e') || 'i' ) || 'o' ) || 'u' ) || 'y'
OK, next we need to understand the character literals, like ‘a’ or ‘e’ have values. Often they are encoded with the ASCII. Then ‘a’ has the value 97 and ‘e’ has the value 101 and so on. You may look in an ASCII table. But how can such values be interpreted as Boolean true
or false
? Simple: C++ treats 0 as false
and all other values as true.
Additionally, you need to learn about Boolean short-circuit evaluation. Please read here
This means: If the result is already clear, then no further operation will be performed. For example. If ( text.at(i) == 'a')
, so, if the letter is really ‘a’, then the result is true. No further operation will be done. Easy. If not, then the expression would result in:
(((((false) || 'e') || 'i' ) || 'o' ) || 'u' ) || 'y'
Then, the evaluation will be false || ‘e’
. and ‘e’ is equal to 101, resulting in: false||101
. And this is always true. End of evaluation. Nothing more will be done. The result is either true, because a letter is ‘a’ or anyway. This is called a tautology: Always true. And hence, all characters are printed.
And, our friend, the compiler with all warnings enabled, or static code analysis, will find this problem and inform us.
I saw in another answer the example something like std::cout << 97 || 101 || 105 || 111 || 117 || 121;
My guess is that the author does not understand fully what it means. Here the inserter operator <<
has a higher precedence than the logical or operator ||`. You can also rewrite this to:
(((((std::cout << 97) || 101 ) || 105 ) || 111 ) || 117 ) || 121;
As you can see. First we will write the number 97 to std::cout
The inserter operator will always return a reference to the stream, for which it was called. So, next we have:
((((std::cout || 101 ) || 105 ) || 111 ) || 117 ) || 121;
Now, std::cout
is used in a Boolean expression. And guess what? std::cout
has a bool operator (see here). And this will return true
if everything was OK. And since everything is OK for writing something to std::cout
, the short circuit evaluation will kick in, and no further operation is executed. The resulting statement would be the same as writing ‘true;’ in a line. The output is only a side effect of the evaluation of the boolean expression. But this was just a side note.
What we learn from that. We should really understand the expressions that we are writing. And, a very important recommendation: Use brackets, even if redundant. Always use many brackets to make it clear what you mean. You would have recognized the error by yourself, if you would have used brackets.
Isn’t it astonishing, how many explanations are necessary to analyze such a simple statement? But now we understand and can work on the solution.
Recommended solution
In order to compare a letter with all of the vowels ‘a’, ‘e’, ‘I’, ‘o’, ‘u’ and so on, we need to compare the letter with all of the vowels ‘a’, ‘e’, ‘I’, ‘o’, ‘u’ and so on. Yes. That is it:
if ( (text.at(i) == 'a') || (text.at(i) == 'e') || (text.at(i) == 'i') || (text.at(i) == 'o') || (text.at(i) == 'u') ) . . .
BTW, the letter ‘y’ is not considered as a vowel. Anyway. You can search what you want.
Additionally, there are many many different solutions possible.
Some are faster than others, some or more intuitive, some require less typing work and whatever.
The 2 solutions shown in the other answers, by using a std::set
and find
or a std::array
together with std::find
are not the fastest solutions.
In the case of the std::set
, a std::unordered_set
together with its count
function would be more efficient and faster, because the std::unordered_set
uses fast hasing algorithms for searching. And std::find
can also be outperformed by a simple calculation. This I will show in the end.
If you prefer something more readable (and maybe know other languages) you can model an in
-like operation, for example:
if (5, in{ 1, 2, 3, 4, 5 }) . . .
But for this you need to do some really advance stuff. I just want to show you here what is possible. As I said, there are many many different possible solutions. For the above statement we would need to additionally define some wrapper class for a std::initializer_list
and overload its comma operator. As, I said, some heavy stuff:
template<class T>
struct in {
in(const std::initializer_list<T>& il) : ref(il) {}
const std::initializer_list<T>& ref;
};
template<class T>
bool operator,(const T& lhs, const is_in<T>& rhs) {
return std::find(rhs.ref.begin(), rhs.ref.end(), lhs) != rhs.ref.end();
}
Forget it for now.
Now, last but not least, the fastest possible solution for alpha ASCII letters. The if
-statement:
if ( 0x02208222 >> (text.at[i] & 0x1f)) & 1) . . .
Will give you what you need. We have here only a fast bitwise-and then a bitwise-shift and again a bitwise-and. This can be evaluated superfast.
But how the hack?
This know how is very very old and was used in ancient times, when people were working with assembler on their 8 bit microcontrollers. We rely on ASCII for alpha letters and on the inventors, who made some smart decisions when defining the code.
Please see the below picture.
If you look at the uppercase alpha letters and the lowercase alpha letters, then you can see that only the first 5 bits are important to encode any apha letter. That is clear, because with 5 bits you can encode values between 0..31. And we have only 26 alpha letters in western languages. The bit number 5 (the sixth bit, counting starts with 0) decides, about the case. A set bit number 5 means lowercase. This you can clearly see in the tables. And because of that, the delta between a lowercase alpha letter and an uppercase alpha letter is always 32. In binary a number with bit number 5 (the sixth bit, counting starts with 0) set. And, if we want to ignore the case, then we make a bitwise-and (masking) with 0001 1111
b which is 0x1f
hex and we get a number equivalents from 1 to 26 for the letters ‘A’ – ‘Z’ for any case. That is for the part text.at[i] & 0x1f
. Aha.
The next smart idea is to encode alpha letters that we want to search in bit positions. For example. We take a 32bit unsigned integer value. In this value, we set the bit according to the number that the letter has. If we search for example for an ‘a’, then we see that ‘a’ has an associated number 1, and we set the bit number 1 (the 2nd bit, counting starts with 0). If we additionally look for an ‘e’ we set the bit number 5 (the sixth bit, counting starts with 0) in our value. And for whichever alpha character we want to search, we set the corresponding bit in our 32 bit unsigned int value.
After we did this for all the alpha letters that you are searching for, we get the resulting number 0x02208222
hex. Aha. Now we know, here this is coming from. And if we want to check now for a certain alpha letter, then we shift the bits in this number so many positions to the right as its corresponding alpha letter equivalent indicates. And if we find then a set bit at the lowest possible position, then we have a positive match.
Complicated, but cool. Finds all possible sets of alpha letters ultra-fast.
But I admit, you need to think about that . . .
So, I hope you I could give you a somehow understandable explanation. If there are questions open, then please ask.
.
Benchmark
To compare different solutions, I added a benchmark. The result is remarkable. I run the test 3 times with 1 Billion random test cases.
Resulting Ranking:
1. ASCII approach 100%
2. Unordered_set 395%
3. Standard comparison 522%
4. constexpr array 1046%
5. set 1638%
Implementation of Benchmark:
#include <array>
#include <iostream>
#include <random>
#include <algorithm>
#include <chrono>
#include <unordered_set>
#include <set>
constexpr size_t TestDataSize = 1'000'000'000u;
constexpr size_t NumberOfTestRuns = 3u;
using TestData = std::array<char, TestDataSize>;
const char SourceCharacter[]{"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"};
TestData testData;
bool isVowel_ASCII(const char c) {
return (0x02208222 >> (c & 0x1f)) & 1;
}
static constexpr std::array<char, 12> vowels{ 'A', 'E', 'I', 'O', 'U', 'Y', 'a', 'e', 'i', 'o', 'u', 'y' };
bool isVowel_ConstexprArray(const char c) {
return (std::end(vowels) != std::find(begin(vowels),
end(vowels), c));
}
const std::unordered_set<char> vowel_UnorderedSet{ 'A', 'E', 'I', 'O', 'U', 'Y', 'a', 'e', 'i', 'o', 'u', 'y' };
bool isVowel_UnorderedSet(const char c) {
return vowel_UnorderedSet.count(c);
}
const std::set<char> vowel_Set{ 'A', 'E', 'I', 'O', 'U', 'Y', 'a', 'e', 'i', 'o', 'u', 'y' };
bool isVowel_Set(const char c) {
return (vowel_Set.find(c) != vowel_Set.end());;
}
bool isVowel_Standard(const char c) {
return (c == 'a') || (c == 'e') || (c == 'i') || (c == 'o') || (c == 'u') || (c == 'y') ||
(c == 'A') || (c == 'E') || (c == 'I') || (c == 'O') || (c == 'U') || (c == 'Y') ;
}
int main() {
std::random_device rd;
std::mt19937 generator(rd());
std::uniform_int_distribution<> intRange(0, sizeof(SourceCharacter)-1);
size_t counter_ASCII{};
size_t counter_ConstexprArray{};
size_t counter_UnorderedSet{};
size_t counter_Set{};
size_t counter_Standard{};
auto start = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - start);
for (size_t testRun{}; testRun < NumberOfTestRuns; ++testRun) {
std::cout << "\n\n\nTest run " << (testRun + 1) << "\n\n";
std::generate(testData.begin(), testData.end(), [&]() {return SourceCharacter[intRange(generator)]; });
counter_ASCII = 0u;
counter_ConstexprArray = 0u;
counter_UnorderedSet = 0u;
counter_Set = 0u;
counter_Standard = 0;
// --------------------------------------------------------------------------------
start = std::chrono::system_clock::now();
for (size_t testCase{}; testCase < TestDataSize; ++testCase) {
if (isVowel_ASCII(testData[testCase])) ++counter_ASCII;
}
elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - start);
std::cout << "Test 'ASCII` solution. Resulting vowel count: " << counter_ASCII << "\t Duration: " << elapsed.count() << " ms\n";
// --------------------------------------------------------------------------------
start = std::chrono::system_clock::now();
for (size_t testCase{}; testCase < TestDataSize; ++testCase) {
if (isVowel_UnorderedSet(testData[testCase])) ++counter_UnorderedSet;
}
elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - start);
std::cout << "Test 'unordered_set' solution. Resulting vowel count: " << counter_UnorderedSet << "\t Duration: " << elapsed.count() << " ms\n";
// --------------------------------------------------------------------------------
start = std::chrono::system_clock::now();
for (size_t testCase{}; testCase < TestDataSize; ++testCase) {
if (isVowel_Standard(testData[testCase])) ++counter_Standard;
}
elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - start);
std::cout << "Test 'standard' solution. Resulting vowel count: " << counter_Standard << "\t Duration: " << elapsed.count() << " ms\n";
// --------------------------------------------------------------------------------
start = std::chrono::system_clock::now();
for (size_t testCase{}; testCase < TestDataSize; ++testCase) {
if (isVowel_ConstexprArray(testData[testCase])) ++counter_ConstexprArray;
}
elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - start);
std::cout << "Test 'Constexpr array` solution. Resulting vowel count: " << counter_ConstexprArray << "\t Duration: " << elapsed.count() << " ms\n";
// --------------------------------------------------------------------------------
start = std::chrono::system_clock::now();
for (size_t testCase{}; testCase < TestDataSize; ++testCase) {
if (isVowel_Set(testData[testCase])) ++counter_Set;
}
elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - start);
std::cout << "Test 'set' solution. Resulting vowel count: " << counter_Set << "\t Duration: " << elapsed.count() << " ms\n";
}
}
(compiled with optimizations on)
Result:
Test run 1
Test 'ASCII` solution. Resulting vowel count: 226384481 Duration: 920 ms
Test 'unordered_set' solution. Resulting vowel count: 226384481 Duration: 3630 ms
Test 'standard' solution. Resulting vowel count: 226384481 Duration: 4780 ms
Test 'Constexpr array` solution. Resulting vowel count: 226384481 Duration: 9620 ms
Test 'set' solution. Resulting vowel count: 226384481 Duration: 15060 ms
Test run 2
Test 'ASCII` solution. Resulting vowel count: 226421607 Duration: 920 ms
Test 'unordered_set' solution. Resulting vowel count: 226421607 Duration: 3630 ms
Test 'standard' solution. Resulting vowel count: 226421607 Duration: 4800 ms
Test 'Constexpr array` solution. Resulting vowel count: 226421607 Duration: 9650 ms
Test 'set' solution. Resulting vowel count: 226421607 Duration: 15060 ms
Test run 3
Test 'ASCII` solution. Resulting vowel count: 226416438 Duration: 920 ms
Test 'unordered_set' solution. Resulting vowel count: 226416438 Duration: 3630 ms
Test 'standard' solution. Resulting vowel count: 226416438 Duration: 4800 ms
Test 'Constexpr array` solution. Resulting vowel count: 226416438 Duration: 9620 ms
Test 'set' solution. Resulting vowel count: 226416438 Duration: 15070 ms
As expected.
- The ASCII solution is the fastest. It was designed for that prupose. But it is for alpha ASCII letters only, not that compatible, not easy to understand and not that readable . . .
- The
std::unordered_set
is ultrafast, easy to read and understand and is hence the idiomatic correct approach
- The naive approach by comparing against all vowels, is astonishing fast. Never understimate an otimizing compiler . . .
@OP Please use the recommended solution