1

As I understand it, Boost.Filesystem uses the native locale encoding, and I use ICU's UnicodeString instead of std::string as it works for Unicode. However, I want to convert my UnicodeString to some kind of std::string of the native locale ending. How would I do this? I'd like to avoid using C strings.

ildjarn
  • 62,044
  • 9
  • 127
  • 211
Jookia
  • 6,544
  • 13
  • 50
  • 60
  • I'm not sure about the "native encoding" part. Filesystems don't usually *have* a notion of encoding; file names are just opaque byte strings. – Kerrek SB Sep 10 '11 at 10:24
  • 1
    @Kerrek: Maybe UNIX filesystems don't, but in Windows, filenames are UTF-16 strings. You can use "opaque byte strings" if you want, but you'd be limited to ASCII or a codepage or some such. – Nicol Bolas Sep 10 '11 at 11:49
  • @Nicol: That's not true. NTFS has 16-bit filenames, yes, but there's no encoding. "Opaque 16-bit strings" is the situation for NTFS. – Kerrek SB Sep 10 '11 at 12:05
  • 1
    @Kerrek: NTFS does not specify the encoding, but _Windows_ does. You may indeed store a non-UTF-16 string of 16-bit values, but you will also find that Windows will not display the names (since they're not valid Unicode strings) or anything of the like. Boost.Filesystem is built on top of the OS facilities, which on Windows machines means playing by Windows's rules. – Nicol Bolas Sep 10 '11 at 12:26
  • @Nicol: Are you saying that `_wfopen()` will cause an error if you pass it a wide string that is not a valid UTF-16 sequence? – Kerrek SB Sep 10 '11 at 12:29
  • The point is I want to have Unicode path support cross platform. – Jookia Sep 10 '11 at 13:23
  • possible duplicate of [ICU C++ Converting Encodings](http://stackoverflow.com/questions/7372328/icu-c-converting-encodings) – tchrist Sep 10 '11 at 18:55
  • @Kerrek: I don't know what `_wfopen` will do if the string you pass it isn't UTF-16. But I do know that the only way to get a non-UTF-16 string to `_wfopen` would be to make one up out of whole cloth or otherwise break an existing UTF-16 string. And Windows Explorer might have some issues displaying it. So I wouldn't test whether or not Microsoft sanitizes the inputs to their file IO or not. – Nicol Bolas Sep 10 '11 at 21:33
  • Actually, NTFS says UTF-16. It does not try to sanitize things, or validate for correctness, but the spec says it is UTF-16, and expects UTF-16. Even more, the NTFS volume info contains a character conversion table, used to prevent you from creating T.txt and t.txt in the same folder. So there is some case conversion happening there. And that is done on the file name as if it is UTF-16. So the NTFS "contract" says UTF-16, but if you pass something else, you are breaking the contract. In UNIX you can set LANG to ja.UTF8 or ja.EUC_JP and create files as you want. No contract, it's all on you. – Mihai Nita Oct 07 '11 at 09:39

1 Answers1

0

I think this gives the answers your looking for:

Conversions to native format

If tl;dr:

boost::filesystem::path p;
// ...
std::string native = p.string();
NuSkooler
  • 5,391
  • 1
  • 34
  • 58