0

I have a main() on Linux that receives command line arguments as char**

int main(int argc ,char * argv[]) 

In my cross platform program I want to use the command line arguments as char16_t. Therefore I need to convert char->char16_t. How do I do that? I tried this but once I leave the loop my debugger shows strange characters in the array. What did I do wrong?

//u16String <- string
std::u16string u16string_from_string(std::string const str) {
  std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> converter;
  return converter.from_bytes(str);
}


bool convert_argv_to_char16(int argc,char * argv[]){    
  char16_t * argv16[argc];
  int i=0;

  // for each arg
  for (char **arg = argv; *arg; ++arg) { 
      std::string S(*arg);
      std::u16string S16 = u16string_from_string(S);
      const char16_t* cc_16 = S16.c_str();
      char16_t* c_16 = (char16_t*) cc_16;    
      argv16[i] = c_16;
      i++;
  }    
  return true;
}
user3443063
  • 1,455
  • 4
  • 23
  • 37
  • 2
    The pointer you store in `argv16[i]` becomes invalid once the `std::u16string` goes out of scope. – Botje Jun 30 '23 at 08:45
  • 2
    Whenever you feel the need to do a C-style cast when programming in C++, you should take that as a sign that you're probably doing something wrong. Casting away `const` is usually one such thing which is wrong. – Some programmer dude Jun 30 '23 at 08:46
  • 2
    As for your problem, why not simply convert to a `std::vector`? What do you need the array for later? – Some programmer dude Jun 30 '23 at 08:47
  • Messing with pointers is what you did wrong. You have no choice with `argv` but `argv16` should be `std::vector`. Probably you think you need the pointers for some reason, but probably you are wrong. – john Jun 30 '23 at 08:54
  • 2
    Also `char16_t * argv16[argc];` is simply not legal C++. In C++ [array bounds must be constant](https://stackoverflow.com/questions/39334435/variable-length-array-vla-in-c-compilers). – john Jun 30 '23 at 08:55
  • 2
    BTW: you seem to use UTF-16 in your application throughout platforms. This is a viable solution, but I would highly suggest you use **UTF-8** on all cross-platform code, only converting to UTF-16 when needed (typically on Windows). This is what we are doing in our company. And now UTF-8 can finally be used as a locale in Windows 10: so the conversion to UTF-16 is theoretically no more needed. – prapin Jun 30 '23 at 10:53

1 Answers1

0

To fix this issue, you need to ensure that the converted std::u16string objects have a lifetime that extends beyond the convert_argv_to_char16 function. One way to achieve this is by using a std::vector<std::u16string> to store the converted arguments. This way, the vector will keep the std::u16string objects alive until the end of the function and you can safely assign the pointers to argv16

#include <vector>
#include <string>
#include <codecvt>

// u16String <- string
std::u16string u16string_from_string(const std::string& str) {
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> converter;
    return converter.from_bytes(str);
}

bool convert_argv_to_char16(int argc, char* argv[]) {
    std::vector<std::u16string> argv16;

    // Convert each arg and store it in the vector
    for (int i = 0; i < argc; ++i) {
        std::u16string S16 = u16string_from_string(argv[I]);
        argv16.push_back(S16);
    }

    // Now you can use argv16 as char16_t* argv[]
    // For example, you can access the elements using argv16[i].c_str()

    return true;
}

By using the std::vector<std::u16string> to store the converted arguments, you ensure their validity throughout the scope of the function. Once the function convert_argv_to_char16 returns, the std::u16string objects in the vector will be destroyed, but the argv16 array won't be holding pointers to invalid memory.