Problem iterating over string with umlauts

Question

In my programm a string (may contain umlauts like "ä", "ö" or other exotic characters like "ß") is to be enciphered.

The second argument is argv[2] and stored in the string text.

The function findInMap looks the passed character up in an map containing of 26 keys (representing columns) and string vector values (representing rows).

Returned is a vector of integer pairs representing rows and columns.

Everything works as expected as long as no exotic characters are used. If they are used a floating point exception occurs.

// Iterate over all characters in string.
for (int i = 0; i < text.size(); i++) {
  vector<pair<int, int>> possibilites
    = findInMap(cipherTable, text.substr(i, 1));
  // Choose one possibility.
  int choice = rand() % possibilites.size();
  cipherString = cipherString + alphabet[possibilites[choice].first]
                  + to_string(possibilites[choice].second + 1);
}

Interestingly, when I change the code above to

vector<pair<int, int>> possibilites
  = findInMap(cipherTable, "ü");

ignoring the input and just looking up "ü", no error occurs, but this for loop is executed twice. Seemingly the string "ü" has size 2.

How can I

iterate over the actual number of characters
extract the correct character from text?

Help is very much appreciated.

Full code: https://github.com/BooneyNoobington/bbfe/tree/master/

Not an exact duplicate but a place to start reading - https://stackoverflow.com/questions/17103925/how-well-is-unicode-supported-in-c11 There have been changes made in C++14, C++17 and C++20 but I don't think the situation has really imporved. — Richard Critten, May 01 '20 at 16:13
You may want to read this: [How to create a minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) — Andreas Wenzel, May 01 '20 at 16:41
The map contains 26 elements? A-Z are 26 characters, but äöüß are four more characters. How shall that work? — Werner Henze, May 01 '20 at 17:08
@RichardCritten: The characters `äöüß` are part of the 8-bit [Extended ASCII](https://en.wikipedia.org/wiki/Extended_ASCII) character set. Therefore, there is no need for UNICODE. — Andreas Wenzel, May 01 '20 at 17:25
Just as a side note: In the German language, in situations where these characters are not supported, `ä` is commonly written as `ae`, `ö` as `oe`, `ü` as `ue` and `ß` as `ss` to work around this problem. — Andreas Wenzel, May 01 '20 at 17:31
@AndreasWenzel have a read of https://en.cppreference.com/w/cpp/language/translation_phases my reading is the Extended ASCII is not supported and characters are not in the base character set are mapped to Unicode escape sequence(s). _"...Any source file character that cannot be mapped to a character in the basic source character set is replaced by its universal character name..."_ — Richard Critten, May 01 '20 at 18:27

Problem iterating over string with umlauts

0 Answers0