4

I'm trying to use UTF-8 std::string and std::filesystem::path in a fool-proof way on Windows 10+ where Beta: Use Unicode UTF-8 for worldwide language support is likely off. C++17 or above.

To go from a path to a UTF-8 string, I have to use path.u8string() instead of path.string().

And to go from a UTF-8 string to a path, I have to use u8path() instead of path().

How can I make the compiler help catch me when I call the "wrong" functions? This includes implicit construction of path, that is:

#include <filesystem>
#include <string>

void foo(const std::filesystem::path& path) {}

std::string pathString; // UTF-8 string.
foo(pathString) // Wrong. path() expects native encoding.
foo(u8path(pathString)) // Make sure to always use u8path().

std::filesystem::path path;
path.string() // Wrong. Returns native encoding.
path.u8string() // Make sure to always use u8string().

Can I force the C++ compiler to warn me when specific functions are called? On either MSVC, GCC or Clang, preferably on all? Can I easily prevent myself from making these errors in some other way?

Update Jan 25, 2021

I've been able to fix my underlying issue by calling setlocale(LC_ALL, ".UTF-8"); before any calls to std::filesystem::path() and std::filesystem::path::string(). This seems to make the MSVC standard library accept and give out strings as UTF-8 on Windows 7+ with Visual C++ 2019 (relevant Microsoft STL issue).

After that I found Use UTF-8 code pages in Windows apps (can't believe it took me so long to find) which also makes std::filesystem::path work in UTF-8, but additionally also makes argv, envp and all A- Win32 API calls use UTF-8 (requires Windows 10, Version 1903).

But the initial question is still valid--how to warn or prevent the use (poison) of arbirary functions.

SephiRok
  • 41
  • 1
  • 1
  • 3
  • Are these functions implemented in header files or in a static library or shared object (DLL)? It's relatively easy to search `objdump` output for unresolved references (because you haven't run the linker yet) to a specific name. – Ben Voigt Jan 18 '22 at 16:57
  • @Ben: Do you mean the `.string()` and `path()` functions I don't want to call? They're part of the c++17 standard library. I have full access to my own code from where I don't want to call these standard library functions. – SephiRok Jan 18 '22 at 17:19
  • Yes, I mean those functions. Are they defined inside `` or declared there and defined inside `msvcrtxxxx.lib` or `libc++xxxx.a` ? When the compiler runs it generates a `.obj` or `.o` file which in addition to the machine code version of your own code, lists all identifiers which the linker needs to resolve, which will include all identifiers that your program uses to call functions in the C++ standard library runtime library (static or DLL), but it won't include things defined in header files and inlined to remove the function call. – Ben Voigt Jan 18 '22 at 17:49

2 Answers2

1

GCC and clang have a poison pragma which you might find helpful here. Unfortunately, you can only poison tokens, and not fully qualified names, so while you can do:

#pragma GCC poison string

you cannot do (for example):

#pragma GCC poison std::filesystem::path::string

Still, maybe this is something you could wrap in a #ifdef and enable it periodically in a test build to weed out errors in your code.

Here's a fully worked example:

#include <filesystem>
#include <string>

#pragma GCC poison string

int main ()
{
    std::filesystem::path path;
    auto x = path.string();         // error: attempt to use poisoned "string"
    auto y = path.u8string();       // OK
}

Live demo

Also, see this question for an MSVC solution. More #ifdef's coming to a screen near you :)

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • 1
    Unfortunately also doesn't catch the implicit `path(string)` constructor, which is the easiest for a human to miss as well. – SephiRok Jan 18 '22 at 17:33
0

A hacky approach I've used in similar situations in the past: Edit the system header(s) and add [[deprecated]] to the functions that you don't want to use.

milianw
  • 5,164
  • 2
  • 37
  • 41