Align non UTF-8 characters when printing in c++

Question

I am trying to figure out a generic solution that can be used to align text to the left and right of a specified line width.

Note: the text can be in almost any international language, English, Japanese, Chinese etc.

i.e

  std::wstring str1 = L"Hello1";
  std::wstring str2 = L"Hello2";
  std::cout << std::string(50, '-') << endl;
  std::wcout << std::left << std::setw(25) << str1 << std::right << std::setw(25) << str2 << std::endl;

Produces the following:

--------------------------------------------------
Hello1                                      Hello2

The "----" line is 50 characters wide (ignoring the new line) and the two strings "Hello1" and "Hello2" are aligned to the left and right.

But The issue is with the following:

  std::wstring str1 = L"Hello1";
  std::wstring str2 = L"Hello2";
  std::wstring str3 = L"こんにちは";
  std::wstring str4 = L"你好";

  std::cout << std::string(50, '-') << endl;
  std::wcout << std::left << std::setw(25) << str1 << std::right << std::setw(25) << str2 << std::endl;
  std::wcout << std::left << std::setw(25) << str3 << std::right << std::setw(25) << str4 << std::endl;
  std::cout << std::left << std::setw(25) << "こんにちは" << std::right << std::setw(25) << "你好" << std::endl;

Which produces the following:

--------------------------------------------------
Hello1                                      Hello2
S�kao                                           `}
こんにちは                             你好

I have tried to figure out a way to align the third row to the right without success, ideas?
I do not understand why the second row is presented as "junk", any idea on how to fix this row without any major change?

You've mixed up UTF16, UTF8 and ASCII strings. `string` and `std::string` are used for UTF8 or ASCII text. UTF8 text literals use the `u8` prefix. For UTF16 the correct type is `u16string`. `wstring` and `L` are for double-byte strings that are a *subset* of UTF16. So `L"こんにちは"` is a double-byte string while `"こんにちは"` is just an ASCII string, that cant' represent any of those characters — Panagiotis Kanavos, Feb 11 '20 at 12:41
Your title says "non UTF-8" - Is there a particular reason why you don't want to use UTF-8? You are using string literals in your example. Make sure that the encoding of the source file, the encoding of the compiler, and the encoding you use when printing match (and please tell us what encoding(s) you use). Last but not least, you seem to be miss that chars and wchars in multibyte strings are codepoints for multibyte characters, which can be anything from 1 to 4 bytes. You want to count the multibyte characters and then use that in setw. — Max Vollmer, Feb 11 '20 at 12:44
Does this answer your question? [How to get the accurate length of a std::string?](https://stackoverflow.com/questions/31652407/how-to-get-the-accurate-length-of-a-stdstring) — Max Vollmer, Feb 11 '20 at 12:44
Check [String and character literals](https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=vs-2019). Since you want to use UTF8, you should use the `u8` prefix in all string literals. Note that C++20 introduces the `char8_t` and `u8string` types, to avoid confusion between ASCII and UTF8. — Panagiotis Kanavos, Feb 11 '20 at 12:45
Once you get Unicode involved, things like "align" become **really** complicated. That's no fault of Unicode. It covers all of the world's languages, and there's a great variety. You see this in the Unicode `Right-to-Left (RTL)` and `Left-to-Right (LTR)` codepoints. These directly affect alignment — MSalters, Feb 11 '20 at 14:15

Align non UTF-8 characters when printing in c++

0 Answers0