-1

I have a recording application that is reading data from a network stream and writing it to file. It works very well, but I would like to display the file size as the data is being written. Every second the gui thread updates the status bar to update the displayed time of recording. At this point I would also like to display the current file size.

I originally consulted this question and have tried both the stat method:

struct stat stat_buf;
int rc = stat(recFilename.c_str(), &stat_buf);
std::cout << recFilename << " " << stat_buf.st_size << "\n";

(no error checking for simplicity) and the fseek method:

FILE *p_file = NULL;
p_file = fopen(recFilename.c_str(),"rb");
fseek(p_file,0,SEEK_END);
int size = ftell(p_file);
fclose(p_file);

but either way, I get 0 for the file size. When I go back and look at the file I write to, the data is there and the size is correct. The recording is happening on a separate thread.

I know that bytes are being written because I can print the size of the data as it is written in conjunction with the output of the methods shown above. enter image description here

The filename plus the 0 is what I print out from the GUI thread. 'Bytes written x' is out of the recording thread.

Community
  • 1
  • 1
dmedine
  • 1,430
  • 8
  • 25
  • 3
    If you are using **fwrite()** to write to file, it returns the number of items actually written. Why don't you use those values,handling errors if any, to track the size of the file being written. Or do i get your question wrong? – Biruk Abebe Apr 13 '16 at 18:11
  • Actually, the recording itself uses a `std::streambuf` object to write the data as little endian binary bits. I could track the number of bits there, but would rather keep the recording routine completely independent of the gui (i.e. they shouldn't share any data whatsoever). – dmedine Apr 13 '16 at 18:19
  • You're going to have to keep track from within the function that writes the file. Buffering will prevent you from ever getting accurate size information using file functions such as `stat`, `ftell`, etc. – Carey Gregory Apr 13 '16 at 18:29
  • What do you mean by "go back and look at the file"? Did you snapshot the file back at the time a zero was displayed and then you looked at that snapshot? If you're saying you found the correct data in the file *later*, then maybe the size of the file *was* zero at the time the display said it was zero. Do you have solid evidence that this displayed a zero at a time when the size of the file was not zero? Perhaps it's all working perfectly and the data just hasn't been written to the file yet. – David Schwartz Apr 13 '16 at 18:39
  • @Carey Gregory precision is not super important. I just want to have a ballpark figure for how big the file is getting while the user is recording. – dmedine Apr 13 '16 at 18:55
  • @David Schwartz I mean when I finish recording the file, I inspect it in my file explorer and see how big it is. The previous implementation of this GUI was in python -- but this used exactly the same C++ code to do the actual recording. In the python version I used `os.path.getsize` to track the size during the recording and it would show the growth as expected (which is to say 'yes' I have reason to believe the 0 figure is not correct). – dmedine Apr 13 '16 at 18:59
  • Also, when I let the recording keep running for minutes at a time, the displayed size output doesn't change, but the actual size of the file does. – dmedine Apr 13 '16 at 19:02
  • @dmedine So you never actually checked the size of the file with the operating system and confirmed that it didn't match the file size displayed? If that's true, I don't agree that you have any reason to believe the 0 figure is not correct. The file write could have occurred later. You have no way to know which of two problems you have -- an error in the file writing code causing the writes to occur too late to be reported or an error in the file size displaying code causing it to display a zero size. – David Schwartz Apr 13 '16 at 19:02
  • @David Schwartz Yes, I did check the size of the file with the operating system, in the python implementation. Again, the recording routine in both implementations is the same exact C++ code. The only difference is the GUI code (and the code to check the filesize as it's being written in a new thread) – dmedine Apr 13 '16 at 19:07
  • ps I've edited the question to show that the problem is most definitely with the GUI thread. – dmedine Apr 13 '16 at 19:16
  • You still haven't done the test I asked for and you're still jumping to the conclusion that the problem is the reporting. I can think of a number of ways you'll get these results and the file size actually is zero. The same C++ code might in some cases write to a file immediately and in some cases not do so until later. You could easily be reporting the number of bytes queued to be written to the file later rather than bytes actually written to the file. Please do the definitive test -- see that the file size is reported as non-zero by the OS and that your code later sees it as zero. – David Schwartz Apr 13 '16 at 19:20
  • @dmedine You won't get even vaguely accurate results, much less precise. Until buffers are flushed to disk there's just no way for an app other than the writing app to know how big an open file is. – Carey Gregory Apr 13 '16 at 19:32
  • @CareyGregory That's false. Buffers do not need to be flushed to disk for the data to be read back or the size to be reported. The size of the open file will be correctly reported by the operating system regardless of any buffering or caching that takes place. – David Schwartz Apr 13 '16 at 19:45
  • @DavidSchwartz I recommend you write some code and see for yourself. The runtime libraries do buffering, and the OS knows nothing of what's in those buffers until they're flushed. It simply cannot report accurate results on an open file with another process doing buffered I/O to it. Impossible. – Carey Gregory Apr 13 '16 at 20:21
  • Then why is it that `os.path.getsize` (which I assume is written in C to begin with) can do it? I think that is exactly the answer I am searching for. – dmedine Apr 13 '16 at 21:02
  • @CareyGregory The OS knows nothing of application buffers that *have nothing to do with the file*. He's asking for the size *of the file*. The OS absolutely *can* and *will* report 100% accurate sizes of an open file while a process is doing buffered I/O to it. Anything in such an application buffer *has not* been written to the file yet. – David Schwartz Apr 13 '16 at 21:34
  • @DavidSchwartz Now you seem to be agreeing with exactly what I said. – Carey Gregory Apr 13 '16 at 21:54
  • Yes but how can I do this since the above methods are clearly failing? – dmedine Apr 13 '16 at 22:16
  • @CareyGregory Say you had some magic function that reported the size of the file including application buffers. Then say the application crashes before it can write those buffers to the file. Are you going to say the file shrunk? Or are you going to say the application "unwrote" them? Until the data has been passed to the OS, it has not yet been written to the file and reporting that as part of the size of the file would be erroneous and lead to absurd results. The question asks how to get the size of the file, and the OS knows the correct size. The buffering is irrelevant. – David Schwartz Apr 13 '16 at 22:31
  • @DavidSchwartz Fine, if you're going to take a dogmatic perspective like that then we completely agree. However, I seriously doubt that the OP has any use for that number since it will continue to be grossly different than what the application has written until the file is closed. The OP wants a progress indicator, and the physical contents of the file on disk are a terrible measure of that. – Carey Gregory Apr 13 '16 at 22:36
  • @CareyGregory I've tried to explain that to the OP, but he is insistent that he wants the actual size of the file, not how big the file will be when the application's buffers are flushed. Look at the comments exchanged between me and him. – David Schwartz Apr 13 '16 at 22:47
  • @dmedine One more time, to 100% clarify, you want the actual size of the file just as the OS would report it. Is that correct? You have confirmed that asking the operating system for the size of the file gives the size you want at the time that your application is reporting zero. Is that again correct? – David Schwartz Apr 13 '16 at 22:48
  • Yes. I want the C equivalent of `os.path.getsize` in python which most definitely reports an incrementally growing file throughout the repeated calls to `std::streambuff->sputn` which writes binary data to said file. The methods shown in my question report 0, which I believe to be false. – dmedine Apr 14 '16 at 17:28
  • So after some more investigation, it turns out that printing from my file-write function was stalling it and delaying the write operation so that I was writing much less data than I had thought so the progress was very slow. After 4096 bytes (Windows' file buffer size I guess) it shows progress. – dmedine Apr 14 '16 at 18:51
  • After the Pepsi challenge, it looks like the `fstream` method is the fastest and most precise, but I am just judging by eye so don't quote me on that. The `stat` method seems to be lagging behind the actual file size and also slightly slowing down the GUI updates. This could be due to an OS/priority issue more than anything. I don't know. I am going to stick with the `fstream` technique in my code. Thank you all for an interesting discussion! – dmedine Apr 14 '16 at 19:04

4 Answers4

1

You can read all about C++ file manipulations here http://www.cplusplus.com/doc/tutorial/files/

This is an example of how I would do it.

#include <fstream>

std::ifstream::pos_type filesize(const char* file)
{
    std::ifstream in(file, std::ifstream::ate | std::ifstream::binary);
    return in.tellg(); 
}

Hope this helps.

Steve Moore
  • 168
  • 6
  • 2
    You wouldn't be able to use this to display the size as it the file is written though. – Stephen Apr 13 '16 at 18:21
  • Op could how ever write a recursive function to do so using it – suroh Apr 13 '16 at 18:25
  • @Afflicted Nope. Disk buffering will prevent `tellg` from giving you accurate results. The writing app would have to either use unbuffered I/O, which will probably kill performance, or do frequent flushes. – Carey Gregory Apr 13 '16 at 18:31
  • @Stephen you are correct. The problem seems to be in the fact that the file is open elsewhere and being written to. In python I can just use `os.path.getsize` on the filename to get its size, even as it's being written to. I want to find out how to do this in C/C++. – dmedine Apr 13 '16 at 18:33
  • @CareyGregory What "disk buffering" are you talking about? Surely a sane operating system makes any disk buffering invisible to processes. Surely every sane OS and streams implementation will report the correct size of the file to the application. – David Schwartz Apr 13 '16 at 18:41
  • @David Schwartz one would assume, however I am going to fire up my laptop and test it. I'm intrigued! – Steve Moore Apr 13 '16 at 18:59
  • @DavidSchwartz No, not at all. Unless you're using unbuffered I/O it's difficult to know for sure how large an open file is until the buffers are flushed. How would an OS know what a C (or C++) library has sitting in its cache? – Carey Gregory Apr 13 '16 at 19:30
  • @CareyGregory Something sitting in the library's cache hasn't been written to the file yet. He's asking how to get the size of the file, not how to get the size of some application cache or buffer. – David Schwartz Apr 13 '16 at 19:44
  • @DavidSchwartz I know what he's asking. If an app does something like `fwrite` to an open file, the data from that write will most likely sit in a buffer and not be written immediately to disk. If you query the file size at that point from another app, it will be **zero.** This is extremely common behavior and has been for a very long time. – Carey Gregory Apr 13 '16 at 20:25
  • @CareyGregory You're confusing two different things. Whether or not the data is written to disk is irrelevant. The operating system will check the cache, not the disk. And he's asking the size *of the file*, not the size of some application buffer. – David Schwartz Apr 13 '16 at 21:33
  • @DavidSchwartz I'm not talking about "some application buffer." When you call `fwrite` or similar, it's extremely unlikely your data will end up in an OS cache immediately. It will be buffered, potentially for quite a while depending on the buffer size and app behavior. You could have demonstrated this for yourself in a whole lot less time than we've been arguing about it. But if you're convinced that functions like `fwrite` go right to disk (or disk cache) immediately, feel free to continue with that fantasy. – Carey Gregory Apr 13 '16 at 21:59
  • BTW, in my application, this answer misbehaves just like the two methods I show in my question. I'm not saying it doesn't work if the file is already closed, but apparently not when it is open. – dmedine Apr 13 '16 at 22:18
  • @CareyGregory Why do you say you're not talking about "some application buffer" and then go on to talk about some application buffer?! Yes, I know, functions like `fwrite` go into some application buffer, but that has nothing to do with the size of the file, which is what this question is about. – David Schwartz Apr 13 '16 at 22:27
  • @DavidSchwartz This is becoming almost comical so clearly we're not communicating very well and it's pointless to continue. I'll simply restate that you're not going to get an accurate size of the file using any of the methods discussed here. It's simply not possible. The only way to obtain an accurate size in near real time is for the writing application to provide it. – Carey Gregory Apr 13 '16 at 22:33
  • @CareyGregory Except that will provide an inaccurate size, unless you think it makes sense to say that when the application crashes it "unwrites" to the file. This answer gives the OP what he insists that he needs. – David Schwartz Apr 13 '16 at 22:46
  • @DavidSchwartz Okay, I hadn't really kept up with the discussion between you and the OP. You and I actually agree. OP needs to face reality. – Carey Gregory Apr 13 '16 at 23:07
0

As a desperate alternative, you can use a ftell in "the write data thread" or maybe a variable to track the amount of data that is written, but going to the real problem, you must be making a mistake, maybe fopen never opens the file, or something like that.

I'll copy a test code to show that this works at least in a singlethread app

int _tmain(int argc, _TCHAR* argv[]) 
{
    FILE * mFile;
    FILE * mFile2;

    mFile = fopen("hi.txt", "a+");

    // fseek(mFile, 0, SEEK_END);

    // @@ this is to make sure that fputs and fwrite works equal
    // fputs("fopen example", mFile);
    fwrite("fopen ex", 1, 9, mFile);

    fseek(mFile, 0, SEEK_END);
    std::cout << ftell(mFile) << ":";

    mFile2 = fopen("hi.txt", "rb");
        fseek(mFile2, 0, SEEK_END);
        std::cout << ftell(mFile2) << std::endl;
   fclose(mFile2);

   fclose(mFile);

   getchar();

   return 0;
}
Rezniaq
  • 105
  • 1
  • 8
0

Just use freopen function before calling stat. It seems freopen refreshes the file length.

TylerD007
  • 113
  • 4
0

I realize this post is rather old at this point, but in response to @TylerD007, while that works, that is incredibly expensive to do if all you're trying to do is get the amount of bytes written.

In C++17 and later, you can simply use the <filesystem> header and call
auto fileSize {std::filesystem::file_size(filePath)}; and now variable fileSize holds the actual size of the file.

  • Why use { } initialization where it makes no sense? I first thought you are defining a function here... – denis.gz Jan 11 '23 at 15:28
  • To be fair, that's just a habit I've gotten into - as someone who is self-taught, I'm not sure if that's technically correct syntax, but I use it more so for clear intention in avoiding any implicit casts - which I guess would be unnecessary here since I'm using the auto keyword anyways. – Ryan McCullough Jan 12 '23 at 17:10