100

Are there pitfalls for specific operating systems, I should know of?

There are many duplicates (1, 2, 3, 4, 5) of this question but they were answered decades ago. The very high voted answers in many of these questions are wrong today.

Methods from other (old QA's) on .sx

  • stat.h (wrapper sprintstatf), uses syscall

  • tellg(), returns per definition a position but not necessarily bytes. The return type is not int.

Kelvin Hu
  • 1,299
  • 16
  • 31
Jonas Stein
  • 6,826
  • 7
  • 40
  • 72
  • 4
    Starter for 10: https://en.cppreference.com/w/cpp/header/filesystem – Richard Critten Jun 30 '19 at 20:36
  • How, exactly, do those answers go wrong? – L. F. Jul 01 '19 at 01:50
  • 5
    @L.F.: Well, the first question has been closed as a duplicate of the second, which explains why the accepted answer in the first *is wrong*. The third one is asking about similar `tellg` problems. The only one worth bothering with is the fourth one, and that one's not great, since it talks too much about `ofstream`, in both the question and its answers. This one is far better at expressing the intent than the others (except for the first, which is oddly closed). – Nicol Bolas Jul 01 '19 at 05:06
  • 6
    Please stop adding irrelevant information to your question and the question title. The year is irrelevant; the technologies are relevant. – elixenide Jul 01 '19 at 21:08
  • 2
    What's wrong with `stat(2)` anyways? Has it grown too old or what? – Lorinczy Zsigmond Jul 02 '19 at 12:11
  • 1
    @LorinczyZsigmond *What's wrong with `stat(2)`* It's not part of the language standard. – Andrew Henle Jul 02 '19 at 13:31
  • @LorinczyZsigmond and Andrew: Thank you for the question. I have added a line to the question – Jonas Stein Jul 02 '19 at 15:52
  • @TedLyngmo The question explains already why it should not be marked as duplicate. I linked the mentioned question and explained, why it makes sense to ask this question again with the scope of C++17. Please remove the `duplicate` tag. – Jonas Stein Jul 02 '19 at 16:01
  • @JonasStein Ok, I apparently got a downvote on the answer I gave there after I marked this as a duplicate. Why I don't know since it was a good answer to the question asked. I marked this as a duplicate since the answer contains two parts, one pre C++17 and one for C++17 where `` is used, but as Nicol implied, it was perhaps too embedded in that questions rotating log functionality to be of much use here. – Ted Lyngmo Jul 02 '19 at 16:11

2 Answers2

135

<filesystem> (added in C++17) makes this very straightforward.

#include <cstdint>
#include <filesystem>

// ...

std::uintmax_t size = std::filesystem::file_size("c:\\foo\\bar.txt");

As noted in comments, if you're planning to use this function to decide how many bytes to read from the file, keep in mind that...

...unless the file is exclusively opened by you, its size can be changed between the time you ask for it and the time you try to read data from it.
– Nicol Bolas

HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
  • 12
    Little offtopic: is there a world where `std::uintmax_t` will be able to hold greater values than `std::size_t`? If not, why not use `std::size_t`, which arguably is more recognisable? +1 on the answer, btw – Fureeish Jun 30 '19 at 20:39
  • 14
    @Fureeish I used just because that's the type `file_size` returns. Looks slightly weird to me too. – HolyBlackCat Jun 30 '19 at 20:40
  • 40
    @Fureeish `std::size_t` is only required to hold the max size of in memory objects. Files can be considerably larger, – Richard Critten Jun 30 '19 at 20:42
  • 3
    @RichardCritten so the answer to the first question of my comment is "yes"? – Fureeish Jun 30 '19 at 20:42
  • 29
    @Fureeish Well, on 32-bit Windows (and I assume on most modern 32-bit platforms) `size_t` is 32 bits, and `uintmax_t` is 64 bits. – HolyBlackCat Jun 30 '19 at 20:43
  • 1
    What does file_size return in case of an error? Is it `0xFFF..FF`, because it is `uint`? I think the https://en.cppreference.com/w/cpp/filesystem/file_size page has 3 contradicting answers to this question. – Jonas Stein Jun 30 '19 at 20:55
  • 1
    @JonasStein: "*What does file_size return in case of an error?*" Ihe value is meaningless because the function errored out. That error means you're not supposed to use it, and checking errors isn't optional. That being said, that page (and the standard) only has one answer for this case: -1, cast to a `uintmax_t`. So I don't see the "3 contradicting answers". – Nicol Bolas Jul 01 '19 at 04:58
  • 16
    @HolyBlackCat: It would be good to say something about the fact that the filesystem is global, and thus unless the file is *exclusively* opened by you, its size can be changed between the time you ask for it and the time you try to read data from it. – Nicol Bolas Jul 01 '19 at 04:59
  • 2
    @JonasStein The overload mentioned throws an exception on error. The other overload return `static_cast(-1)` and stores the corresponding error to `ec`. – L. F. Jul 01 '19 at 05:11
  • 6
    Can't we just agree on `auto size = std::filesystem::file_size("c:\\foo\\bar.txt");`? – Simon Richter Jul 01 '19 at 09:45
  • 1
    This edit seems pedantic. The `stat` function and friends are strictly only correct from the time the kernel interrogates the logical FS, and may have already changed once the file status struct has been made available to the caller. Suggestions on file locking, like `m[un]lock` for memory, would be more constructive. – Brett Hale Jul 01 '19 at 13:02
  • @BrettHale I don't know much about file locks unfortunately, that's why I added Nicol's comment to the answer without going into further details. – HolyBlackCat Jul 01 '19 at 15:17
  • @NicolBolas Even "exclusively opened by you" is misleading. What if you wrote both the code that writes your logs to the file in the background, and the code that prints the most recent few lines on demand? – Nic Jul 01 '19 at 15:42
  • 2
    @NicHartley: Then you have broken your own code, and therefore you on some level both know you've broken it and have the tools to fix it. With other processes coming along and changing the file size behind your back, neither of those is the case. So we're talking about very different scenarios. – Nicol Bolas Jul 01 '19 at 15:50
  • @NicolBolas - this is only response to the 'self-own' :) – Brett Hale Jul 01 '19 at 16:28
  • @NicolBolas My point was that just because you wrote the code, doesn't mean it's safe, which is what you implied. As long as _that file isn't modified between reading the size and depending on it_, it's safe; if it's accessed outside, even by your own code, it's not. That outside code might be outside of that block in another thread, in a library you're calling, in another process, whatever. – Nic Jul 01 '19 at 16:46
  • @Fureeish I believe there is: systems with 32-bit `size_t` and large-file support. What I really wonder is, why not use `off_t`? Isnt’t this exactly what it’s for? – Davislor Jul 01 '19 at 18:00
  • @Fureeish Also of historical interest: 16-bit implementations and those with segmented memory. – Davislor Jul 01 '19 at 18:01
  • 1
    @Davislor `off_t` is not defined in standard C and I suspect not in C++ either. See https://stackoverflow.com/q/9073667/2410359 – chux - Reinstate Monica Jul 02 '19 at 09:31
  • @chux You’re right, so I guess they went with the maximum type because it would be guaranteed as wide as any implementation specific `off_t`, `unsigned long long long int`, etc. – Davislor Jul 02 '19 at 12:57
  • if I'm updating/writing to a file with `fstream` and iteratively call this at the top of a loop to determine size so I can find the new first and last records, will it work? – mLstudent33 May 23 '20 at 18:30
  • @mLstudent33 Hard to say anything without seeing the code. I suggest asking a separate question about it, with the code. – HolyBlackCat May 23 '20 at 19:59
  • I was reading in objects from a file for read and also writing. These objects are sequentially numbered so I decided to push them on to a vector as they're read one by one and then do vector.size(). Then the new object I'm creating in writing to file would be numbered vector.size()+1. – mLstudent33 May 23 '20 at 22:25
31

C++17 brings std::filesystem which streamlines a lot of tasks on files and directories. Not only you can quickly get file size, its attributes, but also create new directories, iterate through files, work with path objects.

The new library gives us two functions that we can use:

std::uintmax_t std::filesystem::file_size( const std::filesystem::path& p );

std::uintmax_t std::filesystem::directory_entry::file_size() const;

The first function is a free function in std::filesystem, the second one is a method in directory_entry.

Each method also has an overload, as it can throw an exception or return an error code (through an output parameter). Below is the detail code explaining all the possible cases.

#include <chrono>
#include <filesystem>  
#include <iostream>

namespace fs = std::filesystem;

int main(int argc, char* argv[])
{
    try
    {
        const auto fsize = fs::file_size("a.out");
        std::cout << fsize << '\n';
    }
    catch (const fs::filesystem_error& err)
    {
        std::cerr << "filesystem error! " << err.what() << '\n';
        if (!err.path1().empty())
            std::cerr << "path1: " << err.path1().string() << '\n';
        if (!err.path2().empty())
            std::cerr << "path2: " << err.path2().string() << '\n';
    }
    catch (const std::exception& ex)
    {
        std::cerr << "general exception: " << ex.what() << '\n';
    }

    // using error_code
    std::error_code ec{};
    auto size = std::filesystem::file_size("a.out", ec);
    if (ec == std::error_code{})
        std::cout << "size: " << size << '\n';
    else
        std::cout << "error when accessing test file, size is: " 
              << size << " message: " << ec.message() << '\n';
}
GOVIND DIXIT
  • 1,748
  • 10
  • 27
  • 2
    What exactly is "this"? Can you explain what all this code is used for, especially when the accepted answer uses much less code? – Nico Haase Jul 02 '19 at 15:24
  • The accepted answer ignores exceptions, which can be thrown when the file doesn't exist, which is pretty common. – Robert Nov 26 '22 at 05:54