1

I have the following code:

#include <iostream>
#include <string>
#include <fstream>

using namespace std;
int main()
{
    string rus = "абвгдеёжзийклмнопрстуфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦШЩЪЫЬЭЮЯ";
    string lat = "abvgděëžzijklmnoprstufhcčšŝŭeûâABVGDĚËŽZIJKLMNOPRSTUFHCČŠŜŬEÛÂ";
    ifstream gdata("data.txt");
    if(!gdata){
        gdata.open("data.txt");
    }
    string temp;
    while(gdata){gdata >> temp;}
    gdata.close();
    ofstream sdata("data.txt", ios::out | ios::trunc);
    for(unsigned int i = 0; i < temp.length(); i++){
        int index = rus.find(temp[i]);
        if(index == -1){sdata << temp[i];}
            else{sdata << lat[index];}
    }
    sdata.close();
    return 0;
}

I would like to read Russian Cyrillic from a file. Then, program would find the index of each character in the string rus, and if it finds the character, then it finds the corresponding letter within lat string. This letter would then be written to the file. Unfortunately, when I type something into the file and then run the program, I get weird output such as @>A8 with random squares (not visible here for some reason). How can I make my program read the Cyrillic properly? I have already looked at over 10 questions here about similar subjects, but considering I'm very much a beginner in C++, nevermind encoding, I didn't understand the answers in the slightest, mainly as no example was provided that I could understand.

Also, even if most characters are latin and there is just one Cyrillic in the text, the entire text becomes malformed into random letters like @>A8

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 1
    You will need to know something about encoding. And probably a bit more than just Unicode, because you appear to be using a strange encoding. Many encodings are fairly pure 8-bit extensions of ASCII, so if you confuse the encoding you get a _different_ set of non-ASCII characters. But "@>A8" is all ASCII. Corrupting the entire text is a strong indication that you're dealing with a _stateful_ encoding, where some bytes mean "switch to different character set". – MSalters Mar 21 '16 at 08:00
  • @MSalters While your input is greatly appreciated, I'm afraid it does not explicitly aid me in producing any solution. – user5749558 Mar 21 '16 at 21:17
  • That was what I feared, which is why I made my response a comment instead of an answer. – MSalters Mar 21 '16 at 21:25

0 Answers0