How to convert extended ascii characters from char to wstring in c++

Question

I read a file using ifstream's read method into a char* memory block. I call my GetChar method on each char in the memory block to write to a wstring. I am trying to write the unicode characters to the screen (at least I think they are unicode, please correct me if I am wrong - oh no it looks like I was wrong its extended ascii). Unfortunately the only way I've got it to work is to hard code the unicode characters in a switch statement, but I'd rather it work for any character, not just the ones I've encountered and added by hard coding them.

Here is what I'm currently using:

std::wstring GetChar(char o)
{
  switch (o)
  {
  case 0x0D:
    return L"♪";
  case 0x0A:
    return L"◙";
  case -38:
    return L"┌";
  case -60:
    return L"─";
  case -77:
    return L"│";
  case -65:
    return L"┐";
  case -61:
    return L"├";
  case -76:
    return L"┤";
  case -2:
    return L"■";
  }

  std::wstring tmp_string(1, o);
  return tmp_string;
}

Any idea how to convert -38 to L"┌" in a generic way? [Edit] I discovered that my mappings are actually extended ascii!, see the webpage https://www.sciencebuddies.org/science-fair-projects/references/ascii-table

I think what I will try is to create a txt file with extended ascii mapping based on this webpage: https://theasciicode.com.ar/ Is there is a simpler programmatic way (eg with setlocale)?

The C++ go-to library for all matters Unicode is [ICU](https://icu.unicode.org/) (at least until we finally get full Unicode support in the standard, which I was told *might* happen in C++23). — DevSolar, Feb 10 '22 at 09:05
There is no such thing as "Extended ASCII", really. Chars outside of standard ASCII (0-127) are *locale-dependant*. What you need is a Unicode library that understands **codepages** or **charsets**. The site you linked to says the "Extended" characters are in [**codepage 437**](https://en.wikipedia.org/wiki/Code_page_437) (aka "DOS Latin US", "DOS OEM US", "IBM437"), which encodes `┌` (U+250C) as byte 0xDA (-38), `─` (U+2500) as byte 0xC4 (-60), etc (FYI, other similar DOS codepages encode those characters in the same way). Most popular Unicode libraries (iconv, ICU, etc) handle that charset. — Remy Lebeau, Feb 11 '22 at 23:09

Code Gorilla · Answer 1 · 2022-02-10T09:07:51.413

0

You could use a std::map<char,wstring>.

wstring ConvertChar (const char target)
{
    static const std::map<char, wstring> convert = {{38,L"┌"}, {76, L"┤"} ....
    auto target = convert.find(target);
    if (target != convert.end())
        return *target;
    return L" ";// NOT found
}

Or there are functions that do the conversion between char and wide. see How to convert char* to wchar_t*?

edited Feb 10 '22 at 09:07

answered Feb 10 '22 at 09:00

Code Gorilla

962
9
23

1

I tried the char* to wchar_t* link, but it doesn't work for my case (I could try again though). I suppose I could also store the mappings in an external unicode text file that users could add missing items too. I might even be able to use a mapping table on the web to programatically generate the text file. I found this page Ú is 0xDA https://unicodelookup.com/#latin/1, not ┌, so maybe what I am looking for isn't standard unicode (I can convert to Ú). Anyway thanks for your help, I'm closer to a solution now. – Twinsen Feb 10 '22 at 09:53
I thought you wanted a non-standard mapping. The function above should work then, you can use that to map A => Z if you really wanted to :) You could populate the map from a file as well. – Code Gorilla Feb 10 '22 at 13:19

score 0 · Answer 2 · answered Feb 10 '22 at 15:06

I got a solution I am happy with. I wrote a key file containing bytes 00 to FF, then I viewed the file with eXtreme (no longer exists) which could show the extended ascii values which I then copied into Notepad++ and saved as unicode values file. I then had mapping for 0x00 to 0xFF to their nice looking extended ascii characters (saved as unicode) all displaying well as wstrings (no hard coded values). I may want to support regular unicode too some day as another mode, but extended ascii is good enough for the files I'm currently working with. Here is an archived version of eXtreme if anyone needs it: http://members.iinet.net.au/~bertdb/ryan/eXtreme/

How to convert extended ascii characters from char to wstring in c++

2 Answers2