0

When going over all files in a directory with directory_iterator storing the c_str() name of a file before using it leads invalid reads (and garbage output).

This seems quite odd to me.

Code examples:

Working:

#include <iostream>
#include <filesystem>

namespace fs = std::filesystem;

int main() {
 for (auto const &entry : fs::directory_iterator("./")) {
   std::cout << entry.path().filename().c_str() << '\n';
 }
}

valgrind reports no errors.

Corrupt output:

#include <iostream>
#include <filesystem>

namespace fs = std::filesystem;

int main() {
 for (auto const &entry : fs::directory_iterator("./")) {
   auto filename = entry.path().filename().c_str();
   std::cout << filename << '\n';
 }
}

valgrind reports 159 invalid reads (of size 1) -- the exact number depends on how many files are in the directory.

Both these snippets have been compiled with gcc 9.1 using the command: g++-9.1 test.cpp -std=c++17

user3818491
  • 498
  • 1
  • 6
  • 16
  • 1
    Possible duplicate of [What is a dangling pointer?](https://stackoverflow.com/questions/17997228/what-is-a-dangling-pointer) – user975989 Jul 21 '19 at 00:34
  • Possible duplicate of [Why does calling c\_str() on a function that returns a string not work?](https://stackoverflow.com/questions/27627413/why-does-calling-c-str-on-a-function-that-returns-a-string-not-work) – Alan Birtles Jul 27 '19 at 06:41

1 Answers1

3

The lifetime of a temporary object is scoped to the statement it was created in. Informally speaking, a statement is a line of code that ends at a semicolon. All temporaries stay live until the end of the entire statement.

From the C++ spec:

When an implementation introduces a temporary object of a class that has a non-trivial constructor ([class.default.ctor], [class.copy.ctor]), it shall ensure that a constructor is called for the temporary object. Similarly, the destructor shall be called for a temporary with a non-trivial destructor ([class.dtor]). Temporary objects are destroyed as the last step in evaluating the full-expression ([intro.execution]) that (lexically) contains the point where they were created. This is true even if that evaluation ends in throwing an exception. The value computations and side effects of destroying a temporary object are associated only with the full-expression, not with any specific subexpression.

Dissecting the working example, we see that operator<< executes before destruction of temporaries.

  • entry.path() = temporary #1
  • .filename() = temporary #2
  • .c_str() gets the character pointer out of temporary #2
  • .c_str() passed into operator<< of std::cout while all of the above are still live
  • Call to operator<< taking the .c_str() pointer is executed and returns.
  • Call to operator<< taking '\n' is executed and returns.
  • All the temporaries are destructed.

Dissecting the broken example, we see a dangling pointer:

  • entry.path() = temporary #1
  • .filename() = temporary #2
  • .c_str() gets the character pointer out of temporary #2 and stored in variable filename
  • End-of-statement: all temporaries destructed. Now filename points at deleted memory -- it is a dangling pointer.
  • Call to operator<< is passed a dangling pointer, which it dereferences as if it were a valid string = undefined behavior.

You can pull out a local variable without corruption by removing the .c_str(), which makes variable filename an object of type std::filesystem::path. std::filesystem::path owns its memory (similar to std::string).

for (auto const &entry : fs::directory_iterator("./")) {
    auto filename = entry.path().filename();
    std::cout << filename << '\n';
}

path also supports ostream output directly, without the need for .c_str().

fifoforlifo
  • 724
  • 5
  • 9