0

I would like to read utf-8 characters from a file in C++ program in Linux platform. In the fgets() function it returns junk characters in place of utf-8 character. input.txt has text triuöschen

#include <stdio.h>
#include <string>

int main()
{
    FILE* fpointer = NULL;
    std::string szFileData = "";
    char cLine[1025] = "\0";
    int iTrailingPointer = 0;
    try {

        const char* pcFileName = "input.txt";
        //OPEN THE FILE IN READ MODE...
        fpointer = fopen(pcFileName, "r");
        if (fpointer == NULL)
        {
            printf("\n Error reading file %s", szFileData.c_str());
            return 0;
        }
        //READ THE FILE DATA TILL THE END OF FILE...
        while (!feof(fpointer))
        {
            memset(cLine, '\0', 1024);
            fgets(cLine, 1024, fpointer);
            iTrailingPointer = (int)strlen(cLine) - 1;

            //REMOVE TRAILING SPACES AND NEWLINES...
            while (cLine[iTrailingPointer] == '\n' || cLine[iTrailingPointer] == ' ' ||
                cLine[iTrailingPointer] == '\t')
            {
                iTrailingPointer--;
            }

            cLine[iTrailingPointer + 1] = '\0';
            szFileData = szFileData + cLine;
            printf("\n szFileData: %s", szFileData.c_str());
        }
        fclose(fpointer);

    }
    catch (...) {

    }
    return 0;
}
WhozCraig
  • 65,258
  • 11
  • 75
  • 141
Vinayak
  • 13
  • 3
  • You seriously need to decide which language you want to use, and edit the tags to that appropriately. – WhozCraig Jun 08 '19 at 06:19
  • @WhozCraig thanks for the suggestion. I need a c++ code to read the file which has wide characters in linux platform. – Vinayak Jun 08 '19 at 06:29
  • @Vinayak You've written code to read byte data. Are you saying that the bytes you've read are not what you exepected? Or are you saying that you need help translating those bytes into wide characters? Or are you saying that you want the code rewritten to read wide characters directly? At present it's unclear to me what you want. `fgets` is perfectly capable of reading UTF-8 bytes (since they are just bytes), but it's not capable of interpreting those bytes as characters. – john Jun 08 '19 at 06:41
  • fgets works just fine. My guess is your console (or whatever you are using to print characters) fails to interpret utf8 properly or your characters aren't written as utf8 in a first place. – Radosław Cybulski Jun 08 '19 at 06:47
  • BTW the comment `//READ THE FILE DATA TILL THE END OF FILE...` is wrong, that's not what the code you have written does. It's a very common misunderstanding, see here https://stackoverflow.com/questions/5431941/why-is-while-feoffile-always-wrong – john Jun 08 '19 at 06:47
  • What's your actual goal here? The problem is that you lack the concepts to ask the right question (this is always the case when newbies ask about UTF-8). Radoslav is probably right, when you say that you are reading garbage, what probably happening is that the read is correct but your attempts to print the bytes you've read show as garbage. – john Jun 08 '19 at 06:56
  • @john Thanks for your response. Yes, I am new to UTF-8. I would like to read the contents of file in to std::string. My next lines of code would process the std::string. – Vinayak Jun 08 '19 at 08:58
  • @Vinayak OK, but you realise that a `std::string` cannot represent Unicode characters? This is will make processing the string difficult. If you really want to read into a `std::string` then you just do it the same way that you read any file into a `std::string` but **the UTF-8 bytes will not be translated into characters**. If you do want to translates the UTF-8 bytes into characters you have to pick some other kind of string, like `std::wstring` or `std::u32string`. – john Jun 08 '19 at 10:04
  • @Vinayak Really you need to do some research on how to process UTF-8. You need to understand the concepts. The difference between bytes and characters for instance, and the different ways to translate one to the other. It's not difficult but newbies often expect it to 'just work', and that's not the case. – john Jun 08 '19 at 10:07

0 Answers0