4

I'm trying to convert a vector of boost::filesystem::path to std::string, using the member function string(). I wrote this and it was working fine on Windows (MSVC 14, 2015):

std::transform(
    users.begin(), users.end(), std::back_inserter(usersStrs),
    std::mem_fn(static_cast<const std::string (PathType::*)() const>(
        &PathType::string)));

Now I moved to gcc (6.3, Debian Stretch), and my code gave linking error that the signature above doesn't exist. To fix it, I had to change the code to:

std::transform(
    users.begin(), users.end(), std::back_inserter(usersStrs),
    std::mem_fn(static_cast<const std::string& (PathType::*)() const>(
        &PathType::string)))

PS: I know a lambda solution is easier, which I now switched to, out of necessity.

At first, I thought MSVC is more tolerant, but then I switched back to Windows and got the opposite linking error, that the first signature is correct. I went to the source code (1.64, path.hpp), and this is what I found:

#   ifdef BOOST_WINDOWS_API
    const std::string string() const
    {
      std::string tmp;
      if (!m_pathname.empty())
        path_traits::convert(&*m_pathname.begin(), &*m_pathname.begin()+m_pathname.size(),
        tmp);
      return tmp;
    }
//...
#   else   // BOOST_POSIX_API
    //  string_type is std::string, so there is no conversion
    const std::string&  string() const { return m_pathname; }
//...
#   endif

So the reasoning I see is that on Windows, since it doesn't use UTF-8 by default, there's a temporary conversion. But why wouldn't boost use the same API for both Windows and Linux? Worst case, it'll cost a copy of a string. Right?

Is there an alternative to path::string() that I should be using to have cross-platform API stability?

The Quantum Physicist
  • 24,987
  • 19
  • 103
  • 189
  • 1
    Why do you even need that `static_cast<>` in the first place? –  Aug 13 '17 at 20:37
  • 1
    @Frank It won't work without it, because there are overloads of `path::string()` with different signatures. The signature to use must be defined. – The Quantum Physicist Aug 13 '17 at 20:39
  • 1
    Maybe that is because on windows path is stored as 2-byte UTF-16 wide chars so conversion to ::std::string is required, while on Linux it is stored as utf-8 chars? I mean that trade off of this specific conversion method is better than trade off of converting string back and forward each time some path operation is performed? – user7860670 Aug 13 '17 at 20:46
  • 1
    @VTT I'm sorry, I thought I mentioned that in the question already. – The Quantum Physicist Aug 13 '17 at 20:47
  • Optimization is preferable to exactly the same definition. You should not depends on the actual return type in your code. Write your code in a way it does not matters if the data is returned by value or by reference. **Depending to much on implementation details is not a good idea anyway!** – Phil1970 Aug 13 '17 at 20:55
  • 2
    @Phil1970: quite the opposite; unless actually necessary, optimization shouldn't get in the way of a coherent API. Also, a function signature is almost never an implementation detail - it's part of the contract between library and user. – Matteo Italia Aug 13 '17 at 21:40
  • @Matteo Italia go and try to follow that on WIndows, where functions have possible four calling conventions (thus, captureless lambdas can't be converted to bool implicitly) and three types of strings. I'd say their OS API isn't very coherent and is rotating around C and C++ extensions, like zero-length arrays and such. – Swift - Friday Pie Aug 14 '17 at 06:17
  • 1
    I don't know about the lambda thing, but with the OS API you chose a bad example. It is documented down to the calling convention and is contractual with a promise not to break even ABI compatibility; they write "deprecated" all the time, but once a function is in a public header it will never go away. The various way of passing stuff arise from 30+ years of history, with APIs such as `CreateWindow` being the same since Windows 1 (and being binary compatible as long as the processor supported it) - which is the reason why I know people still using the original 16 bit cardfile on Windows 7. – Matteo Italia Aug 14 '17 at 07:37

2 Answers2

5

You may be using an old version of the Boost.Filesystem library. Boost 1.64 says the signature is:

string string(const codecvt_type& cvt=codecvt()) const;

The return type is not platform-dependent; it should always be a value, not a reference. Note that this (mostly) matches the C++17 FileSystem library's definition. So if you're getting a reference when the documentation says it's a value, then one of them is wrong. And thus, there's a bug either way.

However, it should be noted that in the C++ standard (and therefore, likely in Boost as well), the assumption for member functions is that they do not have to exactly match the documented specification. For example, a member function can have additional default parameters not listed in the standard. So long as it is callable as stated, that is a valid implementation.

Therefore, you should not expect std::mem_fn to work like this at all. Using C++ standard wording, there should be no assumption that path::string can be converted to a member pointer with that signature. So while it may be inconsistent, the expectation that you can get a member pointer may not be a supported interface for Boost.

Whether it's a bug or not, you can resolve this easily enough by using a lambda:

std::transform(
    users.begin(), users.end(), std::back_inserter(usersStrs),
    [](const auto &pth) -> decltype(auto) {return pth.string();});

It's a lot cleaner looking than the std::mem_fn version. The decltype(auto) prevents an unnecessary copy if it returns a reference.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • On the Lambda thing, we definitely don't disagree. Actually this definition (with codecvt) is available in my boost (1.64), and it's also different for Windows and Linux (reference vs copy)... This is why I'm wondering whether it's a bug. Please check the source of Boost 1.64 and you'll see this in `path.hpp`. – The Quantum Physicist Aug 13 '17 at 21:12
  • 2
    @TheQuantumPhysicist: If the documentation does not match what the source says, then one of them is wrong. And *either* of them being wrong constitutes a bug. Also, see my edits. – Nicol Bolas Aug 13 '17 at 21:13
  • You're missing the `->` before the lambda return type. – aschepler Aug 13 '17 at 21:33
  • Standard reference for your third paragraph? Just "good enough" function signatures would make it impossible to portably disambiguate between overloads, which seems to me a serious limitation... – Matteo Italia Aug 13 '17 at 21:46
  • 2
    @MatteoItalia: See [member.functions]/2: "For a non-virtual member function described in the C ++ standard library, an implementation may declare a different set of member function signatures, provided that any call to the member function that would select an overload from the set of declarations described in this International Standard behaves as if that overload were selected." But this is only true for member functions, not non-member functions. This is also why lambdas exist. – Nicol Bolas Aug 13 '17 at 21:52
1

As mentioned in the comments that Windows path is stored as 2-byte UTF-16 wide chars so conversion to std::string is required. Boost's path.hpp has following conversion for Windows API wstring is not converted here.

#   ifdef BOOST_WINDOWS_API
    const std::string string() const
    {
      std::string tmp;
      if (!m_pathname.empty())
        path_traits::convert(&*m_pathname.begin(), &*m_pathname.begin()+m_pathname.size(),
        tmp);
      return tmp;
    }

    //  string_type is std::wstring, so there is no conversion
    const std::wstring&  wstring() const { return m_pathname; }

But following conversion for Linux API, wstring is converted here

#   else   // BOOST_POSIX_API

//  string_type is std::string, so there is no conversion
    const std::string&  string() const { return m_pathname; }

    const std::wstring  wstring() const
    {
      std::wstring tmp;
      if (!m_pathname.empty())
        path_traits::convert(&*m_pathname.begin(), &*m_pathname.begin()+m_pathname.size(),
          tmp);
      return tmp;
    }

for further reading you can also consult this answer.

Sahib Yar
  • 1,030
  • 11
  • 29