0

I am having a problem here. This is in Unicode. I have a stringtable that has values in it, separated by ;. I've been at this all day and I always end up with immediate runtime errors.

Stringtable looks like:

`blah;blah;foo;bar;car;star`

Then the code:

// More than enough size for this
const int bufferSize = 2048;

// Resource ID to a StringTable
int resid = IDS_MAP;
wchar_t readMap[bufferSize];            
resid = LoadString(NULL, resid, readMap, bufferSize);  

wchar_t* line;
line = wcstok(readMap,L";");

while (line != NULL) {

    line = wcstok(NULL,L";");
    wstring wstr(line); // Problem
    string str(wstr.begin(), wstr.end()); // Problem

    MessageBox(0,line,0,0) // No problem
}

The trouble is when I try to convert wchar_t* line to a wstring, to string. If I uncomment those two lines, it runs fine and message box shows properly.

Any ideas? Asking this question here was my last resort. Thanks.

Evan Carslake
  • 2,267
  • 15
  • 38
  • 56
  • 2
    string is narrow characters. It's not compatible with wchar_t. you might be more productive with wstring. – bmargulies Sep 16 '15 at 01:12
  • @bmargulies is there a way to convert wstring to string? I thought I saw some examples of that somewhere, but also didn't work for me. Would converting with `MultiByteToWideChar` be the way to go? (I haven't tried it) – Evan Carslake Sep 16 '15 at 01:41
  • about `wstring` to `string`: http://stackoverflow.com/questions/4804298/how-to-convert-wstring-into-string –  Sep 16 '15 at 01:44
  • 1
    Other way around, you seem to be looking for `WideCharToMultiByte`. Note that it's a C API so using it correctly is slightly cumbersome. – user253751 Sep 16 '15 at 03:21
  • 1
    You need to check `line != NULL` before writing `wstring wstr(line)` – M.M Sep 16 '15 at 05:53
  • 1
    I'm probably missing something here, but if `LoadString` is `LoadStringW` (Unicode) then `MessageBox` is `MessageBoxW`. The issue is not _how_ to convert to `char*` but _why_. – MSalters Sep 16 '15 at 08:29
  • @M.M Ill add it before the first iteration. @MSalters the whole program is unicode (as I am using GDI+.) The reason for the conversion is because I need to do string comparisons. I have a block of text to read 1 character at a time, and another I need to read chunk at a time. The main problem was the converting string part, like `if (wstring == L"string").` – Evan Carslake Sep 16 '15 at 16:51
  • 1
    @EvanCarslake you made it worse. There needs to be a check in between `line = wcstok`... and `wstring wstr(line);` – M.M Sep 16 '15 at 23:26
  • @M.M I didn't update my code, but yeah I did misunderstand where to check null, and is updated. and even though I am using Unicode throughout the program, I decided to use Ansi for this, a lot simpler. – Evan Carslake Sep 16 '15 at 23:28

1 Answers1

1

This statement:

line = wcstok(readMap,L";");

Reads the first delimited line in the buffer. OK.

However, in your loop, this statement:

line = wcstok(NULL,L";");

Is at the top of the loop and is thus throwing away that first line on the 1st iteration and then reading the next delimited line. Eventually, your loop will reach the end of the buffer and wcstok() will return NULL, but you are not checking for that condition before using line:

line = wcstok(readMap,L";"); // <-- reads the first line

while (line != NULL) {

    line = wcstok(NULL,L";"); // <-- 1st iteration throws away the first line
    wstring wstr(line); // <-- line will be NULL on last iteration

    //...
}

The line = wcstok(NULL,L";"); statement needs to be moved to the bottom of the loop instead:

wchar_t* line = wcstok(readMap, L";");

while (line != NULL)
{
    // use line as needed...

    line = wcstok(NULL, L";");
}

I would suggest changing the while loop into a for loop to enforce that:

for (wchar_t* line = wcstok(readMap, L";"); (line != NULL); line = wcstok(NULL, L";"))
{
    // use line as needed...
}

On the other hand, since you are using C++, you should consider using std:wistringstream and std:getline() instead of wcstok():

#include <string>
#include <sstream>

// after LoadString() exits, resid contains the
// number of character copied into readMap...
std::wistringstream iss(std::wstring(readMap, resid));

std::wstring line;
while (std::getline(iss, line, L';'))
{
    // use line as needed...
}

But either way, this statement is just plain wrong:

string str(wstr.begin(), wstr.end()); // Problem

This statement will work correctly only if the std::wstring contains ASCII characters in the #0 - #127 range. For non-ASCII characters, you have to perform a data conversion instead to avoid data loss for Unicode characters > U+00FF.

Since you are running on Windows, you can use the Win32 API WideCharToMultiByte() function:

std::wstring line;
while (std::getline(iss, line, L';'))
{
    std::string str;

    // optionally substitute CP_UTF8 with any ANSI codepage you want...
    int len = WideCharToMultiByte(CP_UTF8, 0, line.c_str(), line.length(), NULL, 0, NULL, NULL);
    if (len > 0)
    {
        str.resize(len);
        WideCharToMultiByte(CP_UTF8, 0, line.c_str(), line.length(), &str[0], len, NULL, NULL);
    }

    // use str as needed...
    MessageBoxW(0, line.c_str(), L"line", 0);
    MessageBoxA(0, str.c_str(), "str", 0);
}

Or, if you are using C++11 or later, you can use the std::wstring_convert class (only for UTF-8/16/32 conversions, though):

#include <locale> 

std::wstring line;
while (std::getline(iss, line, L';'))
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> conv;
    std::string str = conv.to_bytes(line);

    // use str as needed...
    MessageBoxW(0, line.c_str(), L"line", 0);
    MessageBoxA(0, str.c_str(), "str", 0);
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770