0

I would like to write a simple API which

  1. allows the user to open a file.
  2. let the user write data to the file
  3. track the write calls and sanity check the written data after each write call.
  4. prevents the data from beeing written to disk if it is not valid -> discard(file)

As a starting point i wrote the test program below, which opens a file in fully buffered "rb+" mode using fopen and setvbuf. The stream is opened in fully buffered mode for the following reason:

http://www.cplusplus.com/reference/cstdio/setvbuf/

mode Specifies a mode for file buffering.

Three special macro constants [...]:

_IOFBF Full buffering: On output, data is written once the buffer is full (or flushed). On Input, the buffer is filled when an input operation is requested and the buffer is empty.

My testprogram contains comments where a validity check could be placed and where the buffer contents should be discarded.

My question is how do i accomplish the discard(file) operation which means the step of getting rid of invalid buffer contents ?

The idea behind this is to assemble some data in the buffer, do a regular validity check after each or several write operations and write the data to disk only, if the data is valid. Therefore i would need to discard the buffer, if the validity check fails. When the validity check passes, the whole buffer contents should be written to the file.

My code draft looks like in the following. This is a simplified example:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

int main(void)
{
    static uint8_t buffer[10000];
    
    /* The following would be part of mylib_init */
    FILE *file = fopen("test", "wb+");
    
    if (file == NULL){
        print ("open error!");
        exit(-1);
    }
    
    if ( 0 != setvbuf(file , buffer, _IOFBF , sizeof(buffer) ) ){
        print("Could not set buffer!");
        fclose(file);
        exit (-2);
    }
    
    /* The following would be part of mylib_write_data.
       Each write and check resembles one func call */

    // Pretend the user writes some data into the file
    // ...
    // fwrite(x)
    
    if (data_in_buffer_not_valid(buffer)){
       discard(file);
    }

    // ...
    // fwrite(y)
    //

    if (data_in_buffer_not_valid(buffer)){
       discard(file);
    }

    // ...
    // fwrite(z)
    // ...
    
         
    // The following would be part of mylib_exit
    // Cleanup stuff
    fclose(file)

    return 0;
}
  • 2
    You shouldn't do this using stdio buffering. It doesn't have a discard operation, and it can flush the buffer before you call `fflush()`. You should put the data directly in your own buffer using functions like `sprintf()` and `strcpy()`, which you write to the file when you want. Then you can just empty the buffer if you want to discard it. – Barmar Jan 11 '21 at 16:21
  • That would be doable but a little bit complicated as i would need to imitate stream io on memory (which is not so easy in windows as it is in linux). Would `freopen ("nul","w",file);` be possible? – Wör Du Schnaffzig Jan 11 '21 at 16:30
  • I forgot to say that this was the reason why i used full buffering. On my understanding the buffer should be written to disk earliest when the buffer is completely filled. I can assure that beforehand (either by checks or by constrains to the written data). – Wör Du Schnaffzig Jan 11 '21 at 16:48

1 Answers1

0

If you want to have some like "scratch" temporary file that you want to write your data into and then retrieve them later, then the portable interface would be tmpfile() - it's an interface created just for that. Write to that file, rewind if you want, and when you're ready, rewind it and read from it block by block to another file.

On linux you may use fmemopen and fopencookie to write to a buffer via FILE* - these functions are not available on windows.

I would also strongly consider just creating your own interface that would store the result in memory. Writing an interface like struct mystream; mystream_init(struct mystream *); mystream_printf(struct mystream *, const char *fmt, ...); etc. is some of the tasks you sometimes do in C when fopencookie is not available. And consider writing the interface for storing data, so that instead of calling fwrite you would actually call the function that would check the data and write them and process them along the way.

As for setvbuf, note the standard. From C11 7.21.3p3:

When a stream is unbuffered, characters are intended to appear from the source or at the destination as soon as possible. Otherwise characters may be accumulated and transmitted to or from the host environment as a block. When a stream is fully buffered, [...]. When a stream is line buffered, [...] Support for these characteristics is implementation-defined, and may be affected via the setbuf and setvbuf functions.

And these buffering modes may just be not supported at all. And from C11 7.21.5.6:

The setvbuf function may be used only after the stream pointed to by stream has been associated with an open file and before any other operation (other than an unsuccessful call to setvbuf) is performed on the stream. [...] The contents of the array at any time are indeterminate.

You can't count on anything what will be the content of the buffer. Do not expecting any data there.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • I was a little bit too focused on streaming into memory. tmpfile would be ok, too. However, i maybe should have mentioned that my code calls another lib which already uses file i/o. I don't want to rewrite all that stuff in there for array logic. it's very important that no invalid file is generated at the end. Can "tmpfile" and "setvbuf" be combined to maximize the chance that the data is kept in mem? – Wör Du Schnaffzig Jan 11 '21 at 23:52
  • I believe what you cite from the standard. However i stumbled over this sentence several times when reading through other related s.o. treads: "When a stream is fully buffered, characters are intended to be transmitted to or from the host environment as a block when a buffer is filled". Doesn't it say that the buffer is earliest transmitted when it is completely populated (ignoring manual flushs or closing the file)? Or did i understood that completely wrong? – Wör Du Schnaffzig Jan 11 '21 at 23:53
  • `Can "tmpfile" and "setvbuf" be combined to maximize the chance that the data is kept in mem?` There's no point, the file from `tmpfile` is most probably itself stored in memory anyway. – KamilCuk Jan 12 '21 at 07:58
  • `Doesn't it say that the...?` The part `are intended to be transmitted` doesn't mean "are first stored into the buffer and then are _required_ to be transmitted". There's a difference between intention and requirement. `setvbuf` is more of a "advise" rather then require. While you _may_ depend on such behavior, your code will be just tied to specific and specific version of the C standard library you tested with and may surprisingly and break in different environment. Searching for such bugs, is just pain. – KamilCuk Jan 12 '21 at 07:59