0

SEEMS LIKE Latin1 ISO-8859-1 can't even save special characters so format of the database must be Latin7 ISO-8859-7. Could not really find easy function to do this, do I really have to write one myself?

UPDATE, UPDATE --- I made small progress as described in this question article - Special characters in Visual Studio 2019 C++ project AND executing CMD commands with them

BUT THE PROBLEM SEEMS TO APPEAR ON DEFAULT PROJECT SETTINGS without any mysql library's or anything, IN ALL CORRECT CODED FILES. (UTF8) EVEN WHEN COMPILE FLAGS ARE ADDED, EVEN WHEN "FIX FILE ENCODING" IS INSTALLED.

#include <iostream>
int main() {
    string output = "āāāčččēēēē";

    cout << output << endl;
}

Intro rant* - This is 3rd post about MySql Connector, because I just could not find basic information about MySql Connector in Google at all (MySQL and MariaDB library's in C++ using cmake, mingw), first there was no explanation that GCC will not be able to compile it for Windows systems, then I had no luck finding how to use datetime and int objects in the output from database, until I posted issue here (How to return time, date data fields in c++ mysql oracle vs17?).

My issue now is that strings returned from database have special characters - āàčīēļš etc. Column:test2col Collation:Latin7_general_ci

So here is the code that might work, but does not due to table or something wrong, any expertise might help:

#include <iostream>
#include <string>
#include <string_view>

std::string_view itou[256] {
{"\x00",1}            , {"\x01",1}            , {"\x02",1}            , {"\x03",1}            ,
{"\x04",1}            , {"\x05",1}            , {"\x06",1}            , {"\x07",1}            ,
{"\x08",1}            , {"\x09",1}            , {"\x0a",1}            , {"\x0b",1}            ,
{"\x0c",1}            , {"\x0d",1}            , {"\x0e",1}            , {"\x0f",1}            ,
{"\x10",1}            , {"\x11",1}            , {"\x12",1}            , {"\x13",1}            ,
{"\x14",1}            , {"\x15",1}            , {"\x16",1}            , {"\x17",1}            ,
{"\x18",1}            , {"\x19",1}            , {"\x1a",1}            , {"\x1b",1}            ,
{"\x1c",1}            , {"\x1d",1}            , {"\x1e",1}            , {"\x1f",1}            ,
{"\x20",1}            , {"\x21",1}            , {"\x22",1}            , {"\x23",1}            ,
{"\x24",1}            , {"\x25",1}            , {"\x26",1}            , {"\x27",1}            ,
{"\x28",1}            , {"\x29",1}            , {"\x2a",1}            , {"\x2b",1}            ,
{"\x2c",1}            , {"\x2d",1}            , {"\x2e",1}            , {"\x2f",1}            ,
{"\x30",1}            , {"\x31",1}            , {"\x32",1}            , {"\x33",1}            ,
{"\x34",1}            , {"\x35",1}            , {"\x36",1}            , {"\x37",1}            ,
{"\x38",1}            , {"\x39",1}            , {"\x3a",1}            , {"\x3b",1}            ,
{"\x3c",1}            , {"\x3d",1}            , {"\x3e",1}            , {"\x3f",1}            ,
{"\x40",1}            , {"\x41",1}            , {"\x42",1}            , {"\x43",1}            ,
{"\x44",1}            , {"\x45",1}            , {"\x46",1}            , {"\x47",1}            ,
{"\x48",1}            , {"\x49",1}            , {"\x4a",1}            , {"\x4b",1}            ,
{"\x4c",1}            , {"\x4d",1}            , {"\x4e",1}            , {"\x4f",1}            ,
{"\x50",1}            , {"\x51",1}            , {"\x52",1}            , {"\x53",1}            ,
{"\x54",1}            , {"\x55",1}            , {"\x56",1}            , {"\x57",1}            ,
{"\x58",1}            , {"\x59",1}            , {"\x5a",1}            , {"\x5b",1}            ,
{"\x5c",1}            , {"\x5d",1}            , {"\x5e",1}            , {"\x5f",1}            ,
{"\x60",1}            , {"\x61",1}            , {"\x62",1}            , {"\x63",1}            ,
{"\x64",1}            , {"\x65",1}            , {"\x66",1}            , {"\x67",1}            ,
{"\x68",1}            , {"\x69",1}            , {"\x6a",1}            , {"\x6b",1}            ,
{"\x6c",1}            , {"\x6d",1}            , {"\x6e",1}            , {"\x6f",1}            ,
{"\x70",1}            , {"\x71",1}            , {"\x72",1}            , {"\x73",1}            ,
{"\x74",1}            , {"\x75",1}            , {"\x76",1}            , {"\x77",1}            ,
{"\x78",1}            , {"\x79",1}            , {"\x7a",1}            , {"\x7b",1}            ,
{"\x7c",1}            , {"\x7d",1}            , {"\x7e",1}            , {"\x7f",1}            ,
{"\xc2""\x80",2}      , {"\xc2""\x81",2}      , {"\xc2""\x82",2}      , {"\xc2""\x83",2}      ,
{"\xc2""\x84",2}      , {"\xc2""\x85",2}      , {"\xc2""\x86",2}      , {"\xc2""\x87",2}      ,
{"\xc2""\x88",2}      , {"\xc2""\x89",2}      , {"\xc2""\x8a",2}      , {"\xc2""\x8b",2}      ,
{"\xc2""\x8c",2}      , {"\xc2""\x8d",2}      , {"\xc2""\x8e",2}      , {"\xc2""\x8f",2}      ,
{"\xc2""\x90",2}      , {"\xc2""\x91",2}      , {"\xc2""\x92",2}      , {"\xc2""\x93",2}      ,
{"\xc2""\x94",2}      , {"\xc2""\x95",2}      , {"\xc2""\x96",2}      , {"\xc2""\x97",2}      ,
{"\xc2""\x98",2}      , {"\xc2""\x99",2}      , {"\xc2""\x9a",2}      , {"\xc2""\x9b",2}      ,
{"\xc2""\x9c",2}      , {"\xc2""\x9d",2}      , {"\xc2""\x9e",2}      , {"\xc2""\x9f",2}      ,
{"\xc2""\xa0",2}      , {"\xe2""\x80""\x98",3}, {"\xe2""\x80""\x99",3}, {"\xc2""\xa3",2}      ,
{"\xe2""\x82""\xac",3}, {"\xe2""\x82""\xaf",3}, {"\xc2""\xa6",2}      , {"\xc2""\xa7",2}      ,
{"\xc2""\xa8",2}      , {"\xc2""\xa9",2}      , {"\xcd""\xba",2}      , {"\xc2""\xab",2}      ,
{"\xc2""\xac",2}      , {"\xc2""\xad",2}      , {"\x3f",1}            , {"\xe2""\x80""\x95",3},
{"\xc2""\xb0",2}      , {"\xc2""\xb1",2}      , {"\xc2""\xb2",2}      , {"\xc2""\xb3",2}      ,
{"\xce""\x84",2}      , {"\xce""\x85",2}      , {"\xce""\x86",2}      , {"\xc2""\xb7",2}      ,
{"\xce""\x88",2}      , {"\xce""\x89",2}      , {"\xce""\x8a",2}      , {"\xc2""\xbb",2}      ,
{"\xce""\x8c",2}      , {"\xc2""\xbd",2}      , {"\xce""\x8e",2}      , {"\xce""\x8f",2}      ,
{"\xce""\x90",2}      , {"\xce""\x91",2}      , {"\xce""\x92",2}      , {"\xce""\x93",2}      ,
{"\xce""\x94",2}      , {"\xce""\x95",2}      , {"\xce""\x96",2}      , {"\xce""\x97",2}      ,
{"\xce""\x98",2}      , {"\xce""\x99",2}      , {"\xce""\x9a",2}      , {"\xce""\x9b",2}      ,
{"\xce""\x9c",2}      , {"\xce""\x9d",2}      , {"\xce""\x9e",2}      , {"\xce""\x9f",2}      ,
{"\xce""\xa0",2}      , {"\xce""\xa1",2}      , {"\x3f",1}            , {"\xce""\xa3",2}      ,
{"\xce""\xa4",2}      , {"\xce""\xa5",2}      , {"\xce""\xa6",2}      , {"\xce""\xa7",2}      ,
{"\xce""\xa8",2}      , {"\xce""\xa9",2}      , {"\xce""\xaa",2}      , {"\xce""\xab",2}      ,
{"\xce""\xac",2}      , {"\xce""\xad",2}      , {"\xce""\xae",2}      , {"\xce""\xaf",2}      ,
{"\xce""\xb0",2}      , {"\xce""\xb1",2}      , {"\xce""\xb2",2}      , {"\xce""\xb3",2}      ,
{"\xce""\xb4",2}      , {"\xce""\xb5",2}      , {"\xce""\xb6",2}      , {"\xce""\xb7",2}      ,
{"\xce""\xb8",2}      , {"\xce""\xb9",2}      , {"\xce""\xba",2}      , {"\xce""\xbb",2}      ,
{"\xce""\xbc",2}      , {"\xce""\xbd",2}      , {"\xce""\xbe",2}      , {"\xce""\xbf",2}      ,
{"\xcf""\x80",2}      , {"\xcf""\x81",2}      , {"\xcf""\x82",2}      , {"\xcf""\x83",2}      ,
{"\xcf""\x84",2}      , {"\xcf""\x85",2}      , {"\xcf""\x86",2}      , {"\xcf""\x87",2}      ,
{"\xcf""\x88",2}      , {"\xcf""\x89",2}      , {"\xcf""\x8a",2}      , {"\xcf""\x8b",2}      ,
{"\xcf""\x8c",2}      , {"\xcf""\x8d",2}      , {"\xcf""\x8e",2}      , {"\x3f",1}
};

int main() {
    std::string input{"āāāčččēēēē"};
    std::string output;
    for (auto c : input) {
        output.append(itou[static_cast<uint8_t>(c)]);
    }

    std::cout << output << std::endl;
}
string FirstName = res->getString("test2col");

Documentation for MySQL Connector: https://dev.mysql.com/doc/dev/connector-cpp/8.0/

Seem to not tell much about this, so thanks for any help!

So here is code example that turns into another error based on solutions in commentaries -

157

#include <iostream>
 #include <cppconn/driver.h> 
#include <cppconn/exception.h>
 #include <cppconn/resultset.h>
 #include <cppconn/statement.h> 
#include <cppconn/prepared_statement.h> 
#include <string>
 #include <fstream>
 #include <sstream> 
#include <stdexcept>
 #include <stdlib.h> 
#include <stdio.h> 
#include <time.h> 
#include <cstring> 
#include <filesystem>
 #include <codecvt> 
#include <cstdint>
 #include <locale> 

Severity    Code    Description Project File    Line    Suppression State Error C4996   'std::wstring_convert<std::codecvt_utf8<wchar_t,1114111,(std::codecvt_mode)0>,wchar_t,std::allocator<wchar_t>,std::allocator<char>>::to_bytes': warning STL4017: std::wbuffer_convert, std::wstring_convert, and the <codecvt> header (containing std::codecvt_mode, std::codecvt_utf8, std::codecvt_utf16, and std::codecvt_utf8_utf16) are deprecated in C++17. (The std::codecvt class template is NOT deprecated.) The C++ Standard doesn't provide equivalent non-deprecated functionality; consider using MultiByteToWideChar() and WideCharToMultiByte() from <Windows.h> instead. You can define _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING or _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS to acknowledge that you have received this warning.   

```cpp
     try
            {
                std::unique_ptr<sql::Connection> connection{ nullptr };
                try {
                    sql::Driver* driver = ::get_driver_instance();

                    //sql::Connection* con;
                    //sql::Statement *stmt;
                    //sql::ResultSet* res;
                    //sql::Statement* pstmt;

                    sql::ConnectOptionsMap connection_options{};
                    connection_options["hostName"] = "tcp://127.0.0.1:3306";      // Replace with your log-in
                    connection_options["userName"] = "root"; // ...
                    connection_options["password"] = "parole123!"; // ...
                    connection_options["schema"] = "test";     // ...
                    connection_options["characterSetResults"] = "latin7_general_ci";
                    connection_options["OPT_CHARSET_NAME"] = "latin7_general_ci";
                    connection_options["OPT_SET_CHARSET_NAME"] = "latin7_general_ci";

                    connection.reset(driver->connect(connection_options));
                    driver = get_driver_instance();


                    /* Create a connection */

                    //con = driver->connect("tcp://127.0.0.1:3306", "root", "parole123!");
                    //con->setClientOption("characterSetResults", "UTF8");

                    /* Connect to the MySQL test database */
                    //con->setSchema("test");
                    //pstmt = con->createStatement();
                    std::string const some_query = "SELECT * FROM test2";

                    std::unique_ptr<sql::Statement> statement{ connection->createStatement() };


                    //res = pstmt->executeQuery("SELECT * FROM test2");
                    std::unique_ptr<sql::ResultSet> res{ statement->executeQuery(some_query) };

                    //pstmt->setInt(1, 1);
                    //pstmt->setString(1, str2);
                    //res = pstmt->executeQuery();

                    /* Fetch in reverse = descending order! */

                    ///cikls kur izmantos mysql datu masvu
                    //res->afterLast();
                    while (res->next()) {
                        std::string const FILE_NAME = res->getString("test2col");
    



                        string locations2 = ("C:\\Users\\Janis\\Desktop\\TEST2\\");
                        string txtt = (".txt");
                        string copy2 = ("copy /-y ");

                        string space = " ";
                        string PACIENTI2 = "C:\\PACIENTI\\";




                        string element = copy2 + locations2 + FILE_NAME + txtt;


                        //string StartTime = res->getString("StartTime");
                        //string VisitID = res->getString("VisitID");
                        //string LastModified = res->getString("LastModified");
                        //string Id = res->getString("Id");
                        //string PatientId = res->getString("PatientId");
                        for (auto& p2 : fs::directory_iterator("C:\\Users\\Janis\\Desktop\\TEST2\\")) {
                            if (FILE_NAME != p2.path().string()) {
                                string cmd = element + space + PACIENTI2 + FILE_NAME + txtt;

                                FILE* pipe = _popen(cmd.c_str(), "r");
                                cout << cmd << endl;

                                /*if (pipe == NULL)
                                {
                                    return;
                                }

                                char buffer[128];
                                std::string result = "";

                                while (!feof(pipe))
                                {
                                    if (fgets(buffer, 128, pipe) != NULL)
                                    {
                                        result += buffer;
                                    }
                                }*/
                                //std::cout << "Results: " << std::endl << result << std::endl ;

                                //_pclose(pipe);
                            }
                        }


                    }
                    //delete res;
                    //delete pstmt;
                    //delete con;




                }
                catch (sql::SQLException& ex) {
                    std::cerr << "Error occured when connecting to SQL data base: " << ex.what() << "(" << ex.getErrorCode() << ").";
                }


            }
            catch (sql::SQLException& e)
            {
                ///nav implementēts vairāk info
                //cout << "# ERR: SQLException in " << __FILE__;
                //cout << "(" << __FUNCTION__ << ") on line " << __LINE__ << endl;
                /* what() (derived from std::runtime_error) fetches error message */
                //cout << "# ERR: " << e.what();
                //cout << " (MySQL error code: " << e.getErrorCode();
                cout << "# ERR: SQLException in " << endl;
            }
copy /-y C:\Users\username\Desktop\TEST2\J─ünis.txt C:\PACIENTI\J─ünis.txt
copy /-y C:\Users\username\Desktop\TEST2\Ann─ü.txt C:\PACIENTI\Ann─ü.txt

instead it should be

copy /-y C:\Users\Janis\Desktop\TEST2\Jānis.txt C:\PACIENTI\Jānis.txt
copy /-y C:\Users\Janis\Desktop\TEST2\Annā.txt C:\PACIENTI\Annā.txt

enter image description here

Channa
  • 742
  • 17
  • 28
Ronalds Mazītis
  • 323
  • 2
  • 5
  • 18
  • [SQL Server 2019](https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15) (15.x) introduces an additional option for UTF-8 encoding. [MySQL](https://dev.mysql.com/doc/refman/5.7/en/charset-unicode.html) has various Unicode encodings support. – Eljay May 30 '21 at 18:59

5 Answers5

1

I think the problem in your case is not related to std::wstring: the 8-bit std::string should be sufficient for UTF-8 (creating a simple std::string with the special characters "āàčīēļš" just works fine), while depending on the operating system std::wstring is 2 Byte (Windows) or 4 Byte (Linux) (more information here and here). After all if you have a look at the getString function you will see that it takes and returns an sql::SQLString. The sql::SQLString class is just a simple wrapper for an std::string.

I think you have to specify utf-8 as default character set for MySql: For this you will have to specify the following connection options when connecting to the data base:

std::unique_ptr<sql::Connection> connection {nullptr};
try {
  sql::Driver* driver = ::get_driver_instance();

  sql::ConnectOptionsMap connection_options {};
  connection_options["hostName"] = url;      // Replace with your log-in
  connection_options["userName"] = username; // ...
  connection_options["password"] = password; // ...
  connection_options["schema"] = schema;     // ...
  connection_options["characterSetResults"] = "utf8";
  connection_options["OPT_CHARSET_NAME"] = "utf8";
  connection_options["OPT_SET_CHARSET_NAME"] = "utf8";

  connection.reset(driver->connect(connection_options));
} catch (sql::SQLException& ex) {
  std::cerr << "Error occured when connecting to SQL data base: " << ex.what() << "(" << ex.getErrorCode() << ").";
}

Then you should be able to continue to query your data base as follows

std::string const some_query = "SELECT * FROM some_table_name;";
std::unique_ptr<sql::Statement> statement {connection->createStatement()};
std::unique_ptr<sql::ResultSet> result {statement->executeQuery(some_query)};
while (result->next()) {
  std::string const some_field = result->getString("some_field_name");
  // Process: e.g. display with std::cout << some_field << std::endl;
}

The problem that now emerges when you want to create file names with it or output it to console is Windows itself (I had tested the code before with Linux only and therefore did not run into this issue before!): By default it uses ANSI and not UTF-8. Even if you output something like āàčīēļš it will not output it correctly no matter if you are using a std::cout or std::wcout in combination with std::wstring. Instead it will output ─ü├á─ì─½─ô─╝┼í.

If you extract the bytes

void dump_bytes(std::string const& str) {
  std::cout << std::hex << std::uppercase << std::setfill('0');
  for (unsigned char c : str) {
    std::cout << std::setw(2) << static_cast<int>(c) << ' ';
  }
  std::cout << std::dec << std::endl;
  return;
}

it will output C4 81 C3 A0 C4 8D C4 AB C4 93 C4 BC C5 A1 which plugging it back into a byte-to-utf8-converter such as this one will in fact give you āàčīēļš. So the string was read correctly but Windows is just not displaying it correctly. The following in combination with the last section (specifying utf-8 as default character set in MySql) should fix all your issues:

  • A call to SetConsoleOutputCP(CP_UTF8); from windows.h at the start of the program will fix the console output:

     #include <cstdlib>
     #include <iostream>
     #include <string>
     #include <windows.h>
    
     int main() {
       // Forces console output to UTF8
       SetConsoleOutputCP(CP_UTF8);
       std::string const name = u8"āàčīēļš";
       std::cout << name << std::endl; // Actually outputs āàčīēļš
       return EXIT_SUCCESS;
     }
    
  • Similarly you will have to adapt your routine that creates the files as by default it won't be UTF8 as well (The content of the files will not be an issue but the filename itself will be!). Use std::ofstream from fstream in combination with std::filesystem::u8path from the C++17 library filesystem to resolve this:

     #include <cstdlib>
     #include <filesystem>
     #include <fstream>
     #include <string>
    
     int main() {
       std::string const name = u8"āàčīēļš";
       std::ofstream f(std::filesystem::u8path(name + ".txt")); // Creates a file āàčīēļš.txt
       f << name << std::endl;                                  // Writes āàčīēļš to it
       return EXIT_SUCCESS;
     }
    
2b-t
  • 2,414
  • 1
  • 10
  • 18
  • Almost worked, but where do You get mysql_options. Look at the code I posted from my phone,will beautify later. – Ronalds Mazītis May 26 '21 at 14:16
  • @RonaldsMazītis Ah, sorry that is a typo. Should be `connection_options`. I renamed the variable and forgot this one... – 2b-t May 26 '21 at 14:34
  • :D That is impossible to read like this. ;) I will wait until you have formatted it and then have a look. – 2b-t May 26 '21 at 14:39
  • Hey, I kinda made changes in code, try again ;) – Ronalds Mazītis May 26 '21 at 20:35
  • Please do not post your entire source code over here... Describe your probelm appropriately (what are the error messages you get) and make a minimal reproducible example containing only the things one needs to reproduce your probkem... Your code contains such a lot of things it is hard to understand what it should do and where the problem is. Secondly do not use `std::wstring` as the other answer suggested. The `sql::SQLString` internally uses a `std::string`. Converting it to `std::wstring` therefore makes no sense. – 2b-t May 26 '21 at 22:08
  • I will post a simple example tomorrow that will contain the entire process for a simplified data base. I will let you know as soon as I have updated my answer... – 2b-t May 26 '21 at 22:13
  • @RonaldsMazītis I updated my post. Please just take that code. Change the variable `some_table_name` to the table containing your special characters and `some_field_name` to the name of a field with special characters. Output the variable `some_field` to console, execute the code and tell me what it outputs. Normally this should work like this: No need to use `std::wstring` as the other post suggests! – 2b-t May 27 '21 at 21:11
  • tried Your code, changed the code in question to Yours with results in the end and it still does not display special characters. copy /-y C:\Users\username\Desktop\TEST2\J─ünis.txt C:\PACIENTI\J─ünis.txt – Ronalds Mazītis May 28 '21 at 05:14
  • @RonaldsMazītis Hmmm, I see, just tried it on a Windows operating system (like you) instead of Linux and ran into problems as well but I think I know what the reason is... In fact has nothing to do with mysql-connector. I will try to find a solution in the lunch break. What does `std::cout << "āàčīēļš" << std::endl;` output on your computer? – 2b-t May 28 '21 at 08:59
  • hey that cout is also broken.. what would be the problem? – Ronalds Mazītis May 28 '21 at 09:09
  • @RonaldsMazītis Added a solution. It is only related to how Windows deals with UTF-8. Let we know if it works for you! – 2b-t May 28 '21 at 11:03
  • I still got the problem with this last example - https://ibb.co/Qkpg2Jm also I have set encodings to .cpp file, I even installed fix file encodings extension and it did not help. https://stackoverflow.com/questions/696627/how-to-set-standard-encoding-in-visual-studio – Ronalds Mazītis May 29 '21 at 12:37
  • @RonaldsMazītis Weird, for me this works on Windows 10. Which operating system are you using? Looks like Windows 7. Does the file output work or doesn't it work as well? – 2b-t May 29 '21 at 16:57
  • People are saying to me that Windows console uses only UTF16. – Ronalds Mazītis May 29 '21 at 18:19
  • @RonaldsMazītis This is not true - It uses **Codepage 850** as default: [see here](https://ss64.com/nt/chcp.html), [here](https://stackoverflow.com/a/57134096/9938686) or [here](https://en.wikipedia.org/wiki/Code_page_850). Everything I told you was tested on Windows 10 with Visual Studio 2019 Community Edition and I had the same issues as you in the beginning but the tricks above fixed it. **Could you tell me if the last code snippet with the output to file actually works for you? Can you supply information about your operating system and your Visual Studio version?** – 2b-t May 29 '21 at 21:28
  • Well I have 64 bit windows 7 home premium Service Pack 1, Visual Studio Community 2019, 16.10.0. But same stuff happens in Windows 10 (not sure which version) on test environment virtualbox. Microsoft wants to stay in touch and I will have to register in 5 days. – Ronalds Mazītis May 30 '21 at 12:06
  • @RonaldsMazītis Does the output to file does not work as well? – 2b-t May 30 '21 at 15:25
  • Output file works, but the name is crippled. – Ronalds Mazītis May 30 '21 at 21:50
  • @RonaldsMazītis So both of the fixes work for you neither on Windows 7 nor Windows 10? But you can rename these files manually to contain these special characters, right? – 2b-t May 30 '21 at 22:56
  • I am not going to have a time manually renaming them all. – Ronalds Mazītis May 31 '21 at 09:30
  • https://stackoverflow.com/questions/67773353/special-characters-in-visual-studio-2019-c-project-and-executing-cmd-commands – Ronalds Mazītis May 31 '21 at 12:06
  • @RonaldsMazītis Sorry for only replying now. I have been on holidays over the weekend and took a few more days off and could not test it again on my computer. I found out that you are indeed right and when posting the code onto StackOverflow I forgot to prepend the strings with the `u8`-literal: It should be `u8"āàčīēļš"` and not only `"āàčīēļš"`. Can you try it again with the corrected code (see above) and let me know if that works for you? – 2b-t Jun 02 '21 at 21:30
0

For that you need to convert your string into std::wstring.

#include <codecvt>

string FirstName = res->getString("FirstName");

std::wstring firstNameWstring = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes(FirstName);
  • Severity Code Description Project File Line Suppression State Error C4996 'std::wstring_convert,wchar_t,std::allocator,std::allocator>::to_bytes': warning STL4017: std::wbuffer_convert, std::wstring_convert, and the header (containing std::codecvt_mode, std::codecvt_utf8, std::codecvt_utf16, and std::codecvt_utf8_utf16) are deprecated in C++17. (The std::codecvt class template is NOT deprecated.) – Ronalds Mazītis May 26 '21 at 14:09
  • If you want to remove this warning, then you can add #define _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING in header file. There might be other methods in c++17 but I have not tested it. Above code I have tested in my project. – Sanket Bhurke May 27 '21 at 04:00
0

Problem is that different parts of your code are using different encoding of text data.

Since MySql uses utf-8 you can simply change your program to use UTF-8 everywhere. This can be achieved by build flags:

cl /source-charset:utf-8 /execution-charset:utf-8 /EHsc YourSources.cpp
  • /source-charset:utf-8 - says that your source file is using utf-8 encoding - since your source can use different encoding ensure you are using correct parameter, but is highly recommended that source code is encoded in standard which is universal (so UTF-8), so developers from different countries can work on code without problems.
  • /execution-charset:utf-8 - says that string literals stored in exactable should be encoded as utf-8.

Now problem will be only a console (cmd). By default Windows console uses encoding specific to language setting on your system (inheritance of compatibility with old DOS applications). As a result when you forced your executable to use utf-8 console by default will print those incorrectly.

Changing code page of console so it would use utf-8 will fix the issue.

Here is a test program I've wrote to demonstrate how to handle encoding in C++:

#include <iostream>
#include <locale>
#include <exception>
#include <string>

void setupLocale(int argc, const char *argv[])
{
    std::locale def{""};
    std::locale::global(argc > 1 ? std::locale{argv[1]} : def);
    auto streamLocale = argc > 2 ? std::locale{argv[2]} : def;
    std::cout.imbue(streamLocale);
    std::cin.imbue(streamLocale);
}

void printSeparator()
{
    std::cout << "---------\n";
}

void printTestStuff()
{
    std::cout << "Wester Europe: āāāčččēēēēßÞÖöñÅÃ\n";
    std::cout << "Central Europe: ąĄÓóŁłĘężćźŰűÝýĂă\n";
    std::cout << "China: 字集碼是把字符集中的字符编码为指定集合中某一对象\n";
    std::cout << "Korean: 줄여서 인코딩은 사용자가 입력한\n";
}

int main(int argc, const char *argv[]) {
    try{
        setupLocale(argc, argv);
        printSeparator();
        printTestStuff();
        printSeparator();
    }
    catch(const std::exception& e)
    {
        std::cerr << e.what() << '\n';
    }
}

When you copy paste that program remember to encode source in UTF-8 (since I used wide ranges of characters for testing most of other encodings will just fail - will not be displayed correctly when file is opened).

Now this is what I see on my terminal (copy paste):

C:\Users\User\Downloads>cl /source-charset:utf-8 /execution-charset:utf-8 /EHsc encodings.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29336 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

encodings.cpp
Microsoft (R) Incremental Linker Version 14.28.29336.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:encodings.exe
encodings.obj

C:\Users\User\Downloads>chcp
Active code page: 437

C:\Users\User\Downloads>encodings.exe
---------
Wester Europe: Ä?Ä?Ä?Ä?Ä?Ä?Ä"Ä"Ä"Ä"AYAzA-AA±A.Aƒ
Central Europe: Ä.Ä,A"A3Å?Å,Ä~ÄTżÄ╪źŰűA?A½Ä,ă
China: å--é>+碼æ~_æSSå--ç¬▌é>+ä,-çs,å--ç¬▌ç¼-ç ?ä,ºæO╪årsé>+å?^ä,-æY?ä,?å_1象
Korean: ì,ì-¬ì,o ì?,ì½"ë"cì?? ì,¬ìscìz?ê°? ìz.ë ¥ío
---------

C:\Users\User\Downloads>encodings.exe .65001
---------
Wester Europe: aaaccceeeeß_ÖöñÅA
Central Europe: aAOóLlEezczUuYyAa
China: ????????????????????????
Korean: ??? ???? ???? ???
---------

C:\Users\User\Downloads>encodings.exe .65001 .437
---------
Wester Europe: aaaccceeeeß_ÖöñÅA
Central Europe: aAOóLlEezczUuYyAa
China: ????????????????????????
Korean: ??? ???? ???? ???
---------

C:\Users\User\Downloads>encodings.exe .65001 .1250
---------
Wester Europe: aaaccceeeeß_ÖöñÅA
Central Europe: aAOóLlEezczUuYyAa
China: ????????????????????????
Korean: ??? ???? ???? ???
---------

C:\Users\User\Downloads>chcp 1250
Active code page: 1250

C:\Users\User\Downloads>encodings.exe .65001 .1250
---------
Wester Europe: aaačččeeeeß?ÖönAA
Central Europe: ąĄÓóŁłĘężćźŰűÝýĂă
China: ????????????????????????
Korean: ??? ???? ???? ???
---------

C:\Users\User\Downloads>chcp 65001
Active code page: 65001

C:\Users\User\Downloads>encodings.exe
---------
Wester Europe: ÄÄÄÄÄÄēēēēßÞÖöñÅÃ
Central Europe: ąĄÓóÅłĘężćźŰűÃýĂă
China: 字集碼是把字符集中的字符编ç ä¸ºæŒ‡å®šé›†åˆä¸­æŸä¸€å¯¹è±¡
Korean: 줄여서 ì¸ì½”ë”©ì€ ì‚¬ìš©ìžê°€ 입력한
---------

C:\Users\User\Downloads>encodings.exe .65001
---------
Wester Europe: āāāčččēēēēßÞÖöñÅÃ
Central Europe: ąĄÓóŁłĘężćźŰűÝýĂă
China: 字集碼是把字符集中的字符编码为指定集合中某一对象
Korean: 줄여서 인코딩은 사용자가 입력한
---------

C:\Users\User\Downloads>encodings.exe .65001 .65001
---------
Wester Europe: āāāčččēēēēßÞÖöñÅÃ
Central Europe: ąĄÓóŁłĘężćźŰűÝýĂă
China: 字集碼是把字符集中的字符编码为指定集合中某一对象
Korean: 줄여서 인코딩은 사용자가 입력한
---------

C:\Users\User\Downloads>

As you can see if code page and encodings are setup properly everything just works (without using Windows API).

On my machine cmd.exe is unable to display Asian characters properly, but this is just font issue, when I copy paste cmd.exe content to other program everything is displayed correctly (in case correct setup of encodings in my program).

Note also if conversion is not possible question marks are displayed ar fallback to some other character is performed (for example Å has been converted to A when Windows-1250 encoding and code page was used).

Play a bit whit this program I'm pretty sure this should be ennugh to give you a full picture.

Most important:

  • std::locale::global defines what kind of encoding main program uses. It means that streams assume that std::string values are encoded this way
  • std::iostream::imbue allows to define encoding of the stream, sow what will be written to a file.
  • this both settings are defining both sides of conversion! You do not have to do conversion manually!
Marek R
  • 32,568
  • 6
  • 55
  • 140
  • This does not help ether - C:\Users\Janis\source\repos\baltics\baltics\baltics.cpp /source-charset:utf-8 /execution-charset:utf-8 /EHsc baltics.cpp Or that chcp 65001 did not change anything. – Ronalds Mazītis May 29 '21 at 13:20
  • Your comment shows (especially attempt to explain what have you done) you didn't understood my unswear and didn't follow my instructions. I wonder how I can improve my answer to guide in correct direction? – Marek R May 30 '21 at 08:55
  • Dude, I added those flags and it did not change anything. Nor do changing coding trough console. – Ronalds Mazītis May 30 '21 at 12:01
  • @RonaldsMazītis I had some time to improve my answer. Please try my demo program on your machine and play with it. I think this should be enough to make your code work properly without fighting with encoding yourself. – Marek R Jun 03 '21 at 09:42
  • @RonaldsMazītis did you tried improved version? – Marek R Jun 08 '21 at 07:19
0

C4 81 C3 A0 C4 8D C4 AB C4 93 C4 BC C5 A1, when interpreted as various encodings:

utf8:    āàčīēļš
latin1:  Äà Äīēļš
latin7:  ÄĆÄ Ä Ä«ÄļŔ
euckr    훮횪훾카휆캬큄

Presumably, you wanted utf8 (or utf8mb4).

The connection parameters specify the encoding of the client. If that hex string is coming from the client, then specify utf8 (or utf8mb4).

The encoding in the database table can be the same or something different. That is specified in the schema; for example:

CREATE TABLE ...
    stuff VARCHAR(99) CHARACTER SET utf8
    ...

When INSERTing and SELECTing, MySQL will convert (if necessary) between the client's encoding and the column's encoding.

Discovery:

To see the client settings:

SHOW VARIABLES LIKE 'char%'; 

:--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin7                     | <--
| character_set_connection | latin7                     | <--
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | latin7                     | <--
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8mb3                    |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

To see the column's encoding: SHOW CREATE TABLE tablename;.

To see the actual bytes in a column:

SELECT col, HEX(col) ...

Virtually every other technique is error-prone.

Note that the accented characters you mentioned all have UTF-8 hex like

C3 xx
C4 xx
C5 xx

Terminology:

"Character set" is then encoding. Examples: latin1, latin7, utf8, utf8mb4.

"Collation" refers to how to sort and compare values. Example: latin7_general_ci ("ci" is short for "case insensitive and accent insensitive")

(Your connection parameters are confusing the character set and collation.)

"utf8" or "utf8mb3" is MySQL's 3-byte encoding. Going away, but still quite valid for all European languages, plus much of the rest of the world.

"utf8mb4" is MySQL's 4-byte encoding. Equivalent to UTF-8.

"UTF-8" is the rest of the world's name for it.

Common mistakes:

  • Failure to declare the client's encoding.
  • Attempts to convert the encoding in the client.
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • 1
    I do not think problem is internal organization of data in database. Problem is on communication between database and application. C++ sucks in handling text encoding. – Marek R May 30 '21 at 09:00
  • @MarekR C++ on Windows sucks at it. On Linux UTF-8 does not cause those issues. – 2b-t Jun 01 '21 at 00:57
0

Visual studio let you change the file encoding. So if you are typing the characters on a variable you should have that encoding.

Set the option in Visual Studio or programmatically Open the project Property Pages dialog box. ... Select the Configuration Properties > C/C++ > Command Line property page. In Additional Options, add the /utf-8 option to specify your preferred encoding. Choose OK to save your changes. link https://learn.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-160

Now for database make sure your database has right character set. https://www.a2hosting.com/kb/developer-corner/mysql/convert-mysql-database-utf-8#:~:text=mysql-,To%20change%20the%20character%20set%20encoding%20to%20UTF%2D8%20for,q%20at%20the%20mysql%3E%20prompt.

Once you have right encoding for DB and Visual studio you can save edit and read those characters from c ++ code.

Jin Thakur
  • 2,711
  • 18
  • 15