1

I am trying to write a simple c++ program.


goal: open an existing text file, take name and surname and save them to name and surname strings. print name and surname and jump to the next line. repeat until the end of file.

I have 2 problems

I am using windows 8.1 and visual studio 2017 with latest update.

main code is below:

#include <stdio.h>
#include <stdlib.h>
#include <string>
#include "stdafx.h"
#include <iostream>
using namespace std;


int main() {
FILE *fPtr;



if ((fPtr = fopen("newStudentsList.txt", "r")) == NULL) {
    cout << "File could not be opened.\n";
    system("pause");
}


else {
    char *name = new char[100];
    char *surname = new char[100];

    rewind(fPtr);

    while (!feof(fPtr)) {

        fscanf(fPtr, "%s\t%s\n", name, surname);
        cout << name << " " << surname << endl;
    }

    system("pause");
}
return 0;
}

In output, i cannot see turkish characters properly. This is my first problem.


My second problem is that I cannot take names and surnames properly, since in text file they are not written with identical tabs or blanks and some people have one name some have two names.


All the files are here


How can I print non English characters?


How can I take names and surnames properly?

  • 4
    Please, only ask one question at a time, see also https://stackoverflow.com/help/how-to-ask – Murmel Nov 01 '17 at 10:08
  • 1
    why are not using fstream to handle the file rather using c style FILE.It will be efficiency – leuage Nov 01 '17 at 10:13
  • 1st identify the encoding of your text file. Possible ones are Unicode UTF-8 or UTF-16le; MBCS or code-page. Until you know what you are trying to read displaying correctly will be all but impossible. When you know the encoding post a new question asking how to display it. – Richard Critten Nov 01 '17 at 10:13
  • Second question seemed like an easy question in comparison to first and main one. So i asked it too. –  Nov 01 '17 at 10:14
  • 1
    @J.Snipe that's irrelevant. You should ask just 1 question per question. – bolov Nov 01 '17 at 10:15
  • How can I identify my text file? @RichardCritten –  Nov 01 '17 at 10:15
  • @bolov Okay, i am sorry, i am new here. –  Nov 01 '17 at 10:16
  • @pravaka no it will not. Using *C* functions in a *C++* is a very bad idea and a very strong smell. It means you won't be able to use *any* of C++'s features like streams and iterators. – Panagiotis Kanavos Nov 01 '17 at 10:30
  • It is UTF-8 @RichardCritten –  Nov 01 '17 at 10:39
  • 1
    @J.Snipe mixing C and C++ features is a very bad idea. You should use [streams](https://learn.microsoft.com/en-us/cpp/standard-library/iostream-programming) to open/write to files, as shown [in this C++ tutorial](http://www.cplusplus.com/doc/tutorial/files/). You can use `ifstream` to read non-Unicode files, no matter the encoding. You can use `wifstream` to read UTF16 files. UTF8 files are treated as – Panagiotis Kanavos Nov 01 '17 at 10:40
  • 1
    @J.Snipe then use [ifstream](https://learn.microsoft.com/en-us/cpp/standard-library/fstream-typedefs#ifstream) to read std::string and char data from it. C++ still doesn't have a special type for UTF8 strings. UTF8 files should be read and treated like ASCII files. – Panagiotis Kanavos Nov 01 '17 at 10:41
  • 1
    @J.Snipe to *display* such strings in the Windows console you can change the *console's codepage* to UTF8 with `chcp 65001`, as [shown here](https://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how) – Panagiotis Kanavos Nov 01 '17 at 10:43

2 Answers2

2

First of all, don't use C functions in C++ programs. C++ has different features, different abstractions and different libraries. Using C constructs prevents you from using them.

C++ uses streams to read/write to files, memory and string buffers, over the network etc. It has a large number of algorithms that expect a stream and/or iterator as input.

It also has built-in string types that handle both single-byte (std::string), multi-byte (std::wstring), UTF16 (std::u16string) and UTF32 (std::u32string) libraries. You can specify such string literals in your code. It even has a form of type inference with the auto keyword.

C++ still doesn't have a type for UTF8. Programmers should treat UTF8 strings and files as single byte data and use char and std::string to store them. These values should be converted to other codepages or Unicode types as needed.

This means that you shouldn't have to do anything more than this to display the contents of a UTF8 file to the console. The code is taken from the Input/Output with files tutorial:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main () {
  string line;
  ifstream myfile ("newStudentsList.txt");
  if (myfile.is_open())
  {
    while ( getline (myfile,line) )
    {
      cout << line << '\n';
    }
    myfile.close();
  }

  else cout << "Unable to open file"; 

  return 0;
}

By default, the console uses the codepage of your system locale. You can change it to the UTF8 codepage by typing :

chcp 65001

Before running your application. UTF8 strings should display correctly assuming the console font includes the correct characters

UPDATE

One can specify UTF8 literals but the storage is still char, eg:

const char* str1 = u8"Hello World";  
const char* str2 = u8"\U0001F607 is O:-)";  
const char*     s3 = u8" = \U0001F607 is O:-)"; 

or

auto str1 = u8"Hello World";  
auto str2 = u8"\U0001F607 is O:-)";  
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • Thank you for your proper answer. I just couldn't get the last part. Where should I type chcp 65001 ? –  Nov 01 '17 at 11:04
  • In the console window, before you run your program. This allows the *console* to use the correct codepage for your text. – Panagiotis Kanavos Nov 01 '17 at 11:07
0

Whenever I have a need to output non-ASCII characters in my console programs I just set the console mode to support UNICODE:

_setmode(_fileno(stdout), _O_U16TEXT);

Once this is done the wide-char-aware code works "as expected", i.e. this code:

std::wcout << L"\x046C" << std::endl;
wprintf(L"\x046C\n");

will promptly output an old Cyrillic letter "big yus": Ѭ

Remember to include these files:

#include <io.h>
#include <fcntl.h>

Here's a short test program for you to play with:

#include <conio.h>
#include <iostream>
#include <io.h>
#include <fcntl.h>
void main(){
    _setmode(_fileno(stdout), _O_U16TEXT);
    std::wcout << L"\x046C" << std::endl;
    wprintf(L"\x046C\n");
}
YePhIcK
  • 5,816
  • 2
  • 27
  • 52
  • The OP is reading UTF8 characters, not UTF16. `wcout`, `wprintf` etc are for double-byte characters and UTF16. In C++11 and later the proper types for UTF16 are char16_t, u16string – Panagiotis Kanavos Nov 01 '17 at 16:24