How to split char pointer with multiple delimiters & return array of char pointers in c++?

Question

In the duplicate of this question Split char* to char * Array it is advised to use string rather than char*. But I need to work with LPWSTR. Since it's a typedef of char*, I prefer to use char*. I tried with the following code, which gives the wrong output:

char**splitByMultipleDelimiters(char*ori,char deli[],int lengthOfDelimiterArray)
{
    char*copy = ori;
    char** strArray = new char*[10];
    int j = 0;
    int offset = 0;
    char*word = (char*)malloc(50);
    int length;
    int split = 0;
    for(int i = 0; i < (int)strlen(ori); i++)
    {
        for(int k = 0; (k < lengthOfDelimiterArray) && (split == 0);k++)
        {
            if(ori[i] == deli[k])
            {
                split = 1;
            }
        }
        if(split == 1)//ori[i] == deli[0]
        {
            length = i - offset;
            strncpy(word,copy,length);
            word[length] = '\0';
            strArray[j] = word;
            copy = ori + i + 1;
            //cout << "copy: " << copy << endl;
            //cout << strArray[j] << endl;
            j++;
            offset = i + 1;
            split = 0;
        }
    }
    strArray[j] = copy;
   // string strArrayToReturn[j+1];
    for(int i = 0; i < j+1; i++)
    {
        //strArrayToReturn[i] = strArray[i];
        cout << strArray[i] << endl;
    }
    return strArray;
}

void main()
{
        char*ori = "This:is\nmy:tst?why I hate";
        char deli[] = {':','?',' ','\n'};

        int lengthOfDelimiterArray = (sizeof(deli)/sizeof(*deli));
        splitByMultipleDelimiters(ori,deli,lengthOfDelimiterArray);
}

Are there any other ways to split LPWSTR?

Use `LPWSTR` only where you need it. Why make it harder to process the string simply because you'd need a conversion to a C string later (which can usually be done with `c_str`)? — chris, Aug 07 '15 at 17:21
LPWSTR isn't a pointer to an array of char. it's a pointer to an array of [wide char](https://en.wikipedia.org/wiki/Wide_character). You're going to have to shift your thinking to unicode, my friend. — user4581301, Aug 07 '15 at 17:38
*"Wrong output"* is not an error description. Since noone is going to read your mind, to understand what you expected to be the *correct output*, this isn't very helpful. When describing an error, always include both *expected* behavior and *observed* behavior. — IInspectable, Aug 08 '15 at 14:26

score 0 · Accepted Answer · edited May 23 '17 at 12:29

0

Wait, what are you talking about? I don't see LPWSTR anywhere in your code. Are you trying to convert to LPWSTR? If so, there's a standard library function for that. There's also a standard library-based solution for splitting over multiple chars. So all together, your code might look like this:

#include <codecvt>
#include <cstdio>
#include <locale>
#include <sstream>
#include <string>

using std::string;
using std::wstring;

wstring toWide(const string &original)
{
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
    return converter.from_bytes(narrow_utf8_source_string);
}

std::vector<wstring> splitMany(const string &original, const string &delimiters)
{
    std::stringstream stream(original);
    std::string line;

    while (std::getline(original, line)) 
    {
        std::size_t prev = 0, pos;
        while ((pos = line.find_first_of(delimeters, prev)) != std::string::npos)
        {
            if (pos > prev)
                wordVector.push_back(line.substr(prev, pos-prev));
            prev = pos + 1;
        }
        if (prev < line.length())
            wordVector.push_back(line.substr(prev, std::string::npos));
    }
}

int main()
{
    string original = "This:is\nmy:tst?why I hate";
    string separators = ":? \n"

    std::vector<wstring> results = splitMany(original, separators);
}

This code uses the standard library for these functions and is much less error-prone than doing it manually.

Good luck!

Edit: To be clear, wstring == LPWSTR == wchar_t*.

Edit 2: To convert a string to a wstring:

#include <codecvt>
#include <locale>
#include <string>

using std::string;
using std::wstring;

string toMultiByte(const wstring &original)
{
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
    return converter.to_bytes(original);
}

edited May 23 '17 at 12:29

Community

1
1

answered Aug 07 '15 at 17:51

James Ko

32,215
30
128
239

Thanks for your answer. Are there any standard function to split or parse LPWSTR other than converting and split? – Janu Aug 07 '15 at 18:00
@Janu I don't understand; you're asking how to convert or split `wstring` without converting or splitting them? If you're asking how to do something like parse a number from a `wstring`, then yes there is a [standard library utility](http://stackoverflow.com/a/5119932/4077294) to parse numbers from a `wstring`. Basically all the things you can do with `string` are available for `wstring` as well, check out [the C++ reference for wstring](http://www.cplusplus.com/reference/string/wstring/). – James Ko Aug 07 '15 at 18:06
I need to parse LPWSTR (eg: computer name: Janu\nuser name: Janaki) and retrieve Janu as computer name and Janaki as user name. To retrive like this, I need to split the LPWSTR. – Janu Aug 07 '15 at 18:12
@Janu Then you'll have to convert from `std::wstring` to `string`. See my updated answer for instructions on how to do this. – James Ko Aug 07 '15 at 18:16
1

@JamesKo This is Windows, the source string is unlikely to be UTF8. – Jonathan Potter Aug 07 '15 at 21:39
Windows uses two encodings: MBCS (codpage) and UTF-16. Interpreting an MBCS encoded string as UTF-8 will not end well. – IInspectable Aug 08 '15 at 14:23
@JamesKo I tried to do with string. wordVector is not defined. Based on code, it must be vector. It prints correct result. But when I tried to convert the output to wstring using "wstring toWide(const string &original)" it prints ??. – Janu Aug 08 '15 at 16:53
1. Are you using `wcout`? `std::cout` doesn't work with `wstring`. 2. It looks like I confused `char*` with UTF-8, try something like [this](http://blog.mijalko.com/2008/06/convert-stdstring-to-stdwstring.html) if your string only contains ASCII characters. – James Ko Aug 08 '15 at 17:02
Oh boy, this gets worse and worse. The link you posted simply truncates and trashes data. If you need to convert from MBCS to Unicode, do convert from MBCS to Unicode. The tool is called [MultiByteToWideChar](https://msdn.microsoft.com/en-us/library/windows/desktop/dd319072.aspx). Everything else is just an excuse for not understanding character encodings... – IInspectable Aug 10 '15 at 11:07
@IInspectable Yes, but that's specific to the Windows desktop. I would have posted about `mbstowcs`, but I'm not sure if that's thread safe. – James Ko Aug 10 '15 at 17:30
This API is **not** Windows Desktop only. It's also available for Windows Store and Windows Phone. Hardly a limitation, for a question tagged *winapi*. If you are looking for a thread-safe alternative in C++: [std::mbsrtowcs](http://en.cppreference.com/w/cpp/string/multibyte/mbsrtowcs). Regardless of that, your answer assumes UTF-8, which is wrong. This should not be the accepted answer. – IInspectable Aug 10 '15 at 17:36

How to split char pointer with multiple delimiters & return array of char pointers in c++?

1 Answers1

Linked