29

Introduced in c++17, std::filesystem::u8path seems to be deprecated in c++20.

What is the reason for this choice? What should I use in c++17? What should I use in c++20?

Guillaume Gris
  • 2,135
  • 17
  • 34
  • 1
    Not sure why it was ever there, it seems like `std::filesystem::path` has a constructor that performs the same as that function. – Fantastic Mr Fox Jan 02 '19 at 09:40
  • 2
    From path page on cppreference : “For portable pathname generation from Unicode strings, see u8path” – Guillaume Gris Jan 02 '19 at 09:45
  • Hmmm, true. But it also says: *If the source character type is char, the encoding of the source is assumed to be the native narrow encoding* (for constructor 6). I am pretty sure this covers unicode where unicode is the native format. It says the same in u8path: *If path::value_type is char and native encoding is UTF-8, constructs a path directly*. Maybe the note is unnecessary and the path constructor does the right thing? – Fantastic Mr Fox Jan 02 '19 at 09:52
  • 1
    Native narrow encoding is UTF-8 on Unix systems, on Windows, it's more complicated according to this question: https://stackoverflow.com/questions/4649388/what-is-the-native-narrow-string-encoding-on-windows – Guillaume Gris Jan 02 '19 at 09:59
  • 2
    Because in C++20 path constructor supports construction from `char8_t` (a new fundamental type for representing UTF-8 encoded values). I've updated https://en.cppreference.com/w/cpp/filesystem/path/path to reflect this. – cpplearner Jan 02 '19 at 10:50
  • 2
    @cpplearner that should be an answer, not a comment – Caleth Jan 02 '19 at 10:54
  • Because the C++ standards commitee has no vision for the future, nor clear design goals in their gigantic mess of a language. – Martin Oct 28 '22 at 13:27

1 Answers1

24

Because, thanks to the existence of the C++20 feature char8_t, this will work:

path p(u8"A/utf8/path");

u8path existed to allow the detection of the difference between a UTF-8 string and a narrow character string. But since C++20 will give us an actual type for that, it is no longer necessary.


What should I use in c++17?

Use u8path. Deprecation does not mean removed or inaccessible. It merely means subject to eventual removal.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 5
    What should I do if I have a UTF-8 encoded `std::string`? – jhasse Apr 23 '20 at 08:51
  • 1
    @jhasse: As stated in the post, "Deprecation does not mean removed or inaccessible." Use `u8path` for now, with the intent that you're eventually going to transition to having a proper UTF-8 encoded `u8string`. – Nicol Bolas Apr 23 '20 at 13:25
  • 3
    @NicolBolas thanks for the answer. But if i already have a 'std::string' encoded UTF8, how should i construct a 'path' from it? – Taw Oct 05 '22 at 16:08
  • 1
    @Taw: My answer has not changed: use `u8path` for now. – Nicol Bolas Oct 05 '22 at 16:18
  • In the past three years, I’ve been using a `u8path` wrapper in my filesystem utilities which just calls `std::filesystem::u8path` with the warning suppressed – Guillaume Gris Oct 28 '22 at 14:07
  • > Deprecation does not mean _removed_ or _inaccessible_. It merely means subject to _eventual_ removal. Which means your code will _eventually_ break. – Martin Oct 28 '22 at 15:05
  • @Martin: Sure... eventually. What's your point? Eventually, you ought to be using `char8_t` to mean "string encoded in UTF-8". – Nicol Bolas Oct 28 '22 at 15:19
  • 5
    @NicolBolas if you don't see anything wrong with a 40-year-old language still being at the stage where they introduce features in a version only to deprecate them in the next, then there's no point on having this discussion. – Martin Oct 30 '22 at 01:26
  • @Martin: I don't see what the age of C++ has to do with anything here. `u8path` was added in C++17 and deprecated in C++20. Also, would you prefer that we just not have a way to make filesystem paths from UTF-8-encoded strings at all? Because that was the choice. Sure, it would have been nice if the committee recognized in C++11 the importance of having `char8_t`. But that mistake was made 6 years before C++17. – Nicol Bolas Oct 30 '22 at 01:32
  • @NicolBolas because adding a feature in a revision and deprecating it in the next is understandable in toy languages made by hobbyists, not by decades-old ISO standards defined by a committee of experts. Again, I see no point on continuing this discussion if you don't understand this very basic fact. – Martin Oct 31 '22 at 08:49
  • @Martin: But that's exactly what a "committee of experts" is likely to produce. They solve whatever problem is presented at the time, without looking at what features might be available in the future, because they're focused on the problem at hand. Each proposal is its own locked-off entity and needs to do what it needs to do. Filesystem was proposed at a time when there was no type-based way to distinguish UTF-8 strings. So it was either add a function or don't allow UTF-8 strings. The option to add a UTF-8 string was out of scope for the filesystem proposal. – Nicol Bolas Oct 31 '22 at 13:27
  • 1
    @NicolBolas a commitee of experts should have at least some degree of vision for the future, and design new features in a way that they can interact nicely with the rest of the language. This is such a fundamentally basic concept that I hardly know how to make it clearer. Funnily enough, I just found your name in an old isocpp mailing list thread regarding this exact same issue, where you very arrogantly dismissed a tentative char8_t proposal. I wonder what made you change your mind? – Martin Nov 01 '22 at 15:18
  • @Martin: "*a commitee of experts should have at least some degree of vision for the future, and design new features in a way that they can interact nicely with the rest of the language.*" That's not how the C++ committee has ever worked. "New features" are brought to the committee and discussed. And those features are considered based on what is brought to them. So when presented with `filesystem`, the committee has three options: tell them to return with a language change incorporating a UTF-8 character string, adopt the proposal as-is, or throw it out. They chose the useful option. – Nicol Bolas Nov 01 '22 at 15:27
  • @Martin: "*you very arrogantly dismissed a tentative char8_t proposal*" I have no particular recollection of that. Maybe there was something wrong with the proposal as written (for example, if `char8_t` implicitly converted to `char` or something). Maybe I thought it would be better to have `char` always be assumed to be UTF-8. Maybe something else. – Nicol Bolas Nov 01 '22 at 15:38
  • @Nicol Bolas, in C++20 upatn - stopped working. At least in Visual Studio. Your answer is not correct. – Optimus1 Nov 29 '22 at 08:58
  • @Optimus1: What do you mean by "stopped working"? If it doesn't compile or execute, that's a [violation of the standard](https://timsong-cpp.github.io/cppwp/depr.fs.path.factory). – Nicol Bolas Nov 29 '22 at 14:48