Read sysfs file into buffer for string comparison, without having to open it twice

Question

I have written the following code, modified it a bit for simplicity:

FILE *sysfs_file = fopen("/sys/file", "rb");
if (sysfs_file != NULL){

    /* Loop over file handler until EOF to get filesize in bytes */
    FILE *sysfs_file_get_size = fopen("/sys/file", "rb");
    char d = fgetc(sysfs_file_get_size);
    int filesize = 0;
    while (d != EOF){
        d = fgetc(sysfs_file_get_size);
        filesize++;
    }
    fclose(sysfs_file_get_size);

    /* Allocate buffer and copy file into it */
    char *buf = malloc(filesize);
    char c = fgetc(sysfs_file);
    for (int i = 0; i < filesize; i++)
    {
        buf[i] = c;
        c = fgetc(sysfs_file);
    }
    fclose(sysfs_file);

    if(strstr(buf, "foo")) {
        printf("bar.\n");
    }
}

For security reasons, it seemed better to not assume what size the file will be, and first loop through the file to check of how many bytes it consists.

Regular methods of checking the filesize like fseek() or stat() do not work, as the kernel generates the file at the moment that it is being read. What I would like to know: is there a way of reading the file into a buffer in a secure manner, without having to open a file handler twice?

`rewind`? But the size may change after you count the size. Usually one will just allocate a big enough buffer from the beginning. (How big depends on the file.) — user253751, Aug 18 '22 at 20:57
yiu can find the file size other ways. seek end and ftell or fstat — pm100, Aug 18 '22 at 20:59

Andreas Wenzel · Answer 1 · 2022-08-19T04:11:58.890

First of all, in the line

FILE *sysfs_file = fopen("/sys/file", "rb");

the "rb" mode does not make sense. If, as you write, you are looking for a "string", then the file is probably a text file, not a binary file. In that case, you should use "r" instead.

If you are using a POSIX-compliant platform (e.g. Linux), then there is no difference between text mode and binary mode. In that case, it makes even less sense to specifically ask for binary mode, when the file is a text file (even though it is not wrong).

For security reasons, it seemed better to not assume what size the file will be and first loop through the file to check of how many bytes it consists.

It is not a security issue if you limit the number of bytes read to the size of the allocated memory buffer, i.e. to the number of bytes the file originally had. That way, the file will only be truncated (which is generally not a security issue).

However, if you want to ensure that the file is not truncated, then it would probably be best to ignore the initial size of the file and to simply attempt to read as much from the file as possible, until you encounter end-of-file. If the initial buffer it not large enough to store the entire file, then you can use the function realloc to resize the buffer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

//This function will return a pointer to a dynamically
//allocated memory buffer which contains the file data as
//a string (i.e. that is terminated by a null character).
//The function "free" should be called on this data when it
//is no longer required.
char *create_buffer_with_file_data_as_string( FILE *fp )
{
    char *buffer = NULL;
    size_t buffer_size = 16384;
    size_t valid_bytes_in_buffer = 0;

    for (;;) //infinite loop, equivalent to while(1)
    {
        size_t bytes_to_read, bytes_read;
        char *temp;

        //(re)allocate buffer to desired size
        temp = realloc( buffer, buffer_size );
        if ( temp == NULL )
        {
            fprintf( stderr, "Realloc error!\n" );
            free( buffer );
            return NULL;
        }

        //(re)allocation was successful, so we can overwrite the
        //pointer "buffer"
        buffer = temp;

        //calculate number of bytes to read from input file
        //note that we must leave room for adding the terminating
        //null character
        bytes_to_read = buffer_size - valid_bytes_in_buffer - 1;

        //attempt to fill buffer as much as possible with data from
        //the input file
        bytes_read = fread(
            buffer + valid_bytes_in_buffer,
            1,
            bytes_to_read,
            fp
        );

        //break out of loop if there is no data to process
        if ( bytes_read == 0 )
            break;

        //update number of valid bytes in the buffer
        valid_bytes_in_buffer += bytes_read;

        //double the size of the buffer (will take effect in
        //the next loop iteration
        buffer_size *= 2;
    }

    //verify that no error occurred
    if ( ferror( fp ) )
    {
        fprintf( stderr, "File I/O error occurred!" );
        free( buffer );
        return NULL;
    }

    //add terminating null character to data, so that it is a
    //valid string that can be passed to the functon "strstr"
    buffer[valid_bytes_in_buffer++] = '\0';

    //shrink buffer to required size
    {
        char *temp;

        temp = realloc( buffer, valid_bytes_in_buffer );

        if ( temp == NULL )
        {
            fprintf( stderr, "Warning: Shrinking failed!\n" );
        }
        else
        {
            buffer = temp;
        }
    }

    //the function was successful, so return a pointer to 
    //the data
    return buffer;
}

int main( void )
{
    FILE *fp;
    char *data;

    //attempt to open file
    fp = fopen( "filename", "r" );
    if ( fp == NULL )
    {
        fprintf( stderr, "Error opening file!\n" );
        exit( EXIT_FAILURE );
    }

    //call the function
    data = create_buffer_with_file_data_as_string( fp );
    if ( data == NULL )
    {
        fprintf(
            stderr,
            "An error occured in the function:\n"
            "    create_buffer_with_file_data_as_string\n"
        );
        fclose( fp );
        exit( EXIT_FAILURE );
    }

    //the file is no longer needed, so close it
    fclose( fp );

    //search data for target string
    if( strstr( data, "target" ) != NULL )
    {
        printf("Found \"target\".\n" );
    }
    else
    {
        printf("Did not find \"target\".\n" );
    }

    //cleanup
    free( data );
}

For the input

This is a test file with a target.

this program has the following output:

Found "target".

Note that every time I am calling realloc, I am doubling the size of the buffer. I am not adding a constant amount to the size of the buffer. This is important, for the following reason:

Let's say that the file has a size of 160 MB (megabytes). In my program, I have an initial buffer size of about 16 KB (kilobytes). If I didn't double the size of the buffer every time I call realloc, but instead added a constant amount of bytes, for example added another 16 KB, then I would need to call realloc 10,000 times. Every time I call realloc, the content of the entire buffer may have to be copied by realloc, which means that on average, 80 MB may have to be copied every time, which is 800 GB (nearly a terabyte) in total. This would be highly inefficient.

However, if I instead double the size of the memory buffer (i.e. let the buffer grow exponentially), then it is guaranteed that the amount of data that must be copied will never be more than double the amount of the actual data. So, in my example above, it is guaranteed that never more than 320 MB will have to be copied by realloc.

Fe2O3 · Answer 2 · 2022-09-21T11:44:21.277

You could just estimate what you need in blocks and grow the input buffer as needed...
This is untested, but gives the flavour of what should work.
This version attempts to load the entire file before investigating its content.

FILE *fp = fopen( "/sys/file", "rb" );
if( fp == NULL )
    return -1;

#define BLK_SIZE 1024
char *buf = malloc( BLK_SIZE );
if( buf == NULL )
    return -1;
char *readTo = buf;
size_t bufCnt = 0;
for( ;; ) {
    size_t inCnt = fread( readTo, sizeof *readTo, BLK_SIZE, fp );
    bufCnt += inCnt;
    if( inCnt < BLK_SIZE )
        break;

    // possibly test for EOF here

    char *tmp = realloc( buf, bufCnt + BLK_SIZE );
    if( tmp == NULL )
        return -1;
    buf = tmp;
    readTo = buf + bufCnt;
}
fclose( fp );

printf( "Got %ld valid bytes in buffer\n", bufCnt );

/* do stuff with *buf */

free( buf );

Hopefully the final EDIT of version 2:

I am grateful to @Andreas Wenzel for his cheerful and meticulous testing and comments that turned earlier (incorrect!) versions of my attempts into this prototype.

The objective is to find a string of bytes in a file.

In this prototype, single "buffer loads" are examined sequentially until the first instance of the target is found or EOF reached. This seems to cope with cases when the target bytes are split across two buffer loads. This uses a ridiculously small 'file' and small buffer that would, of course, be scaled up in the real world.

Making this more efficient is left as an exercise for the reader.

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>

// Simulated file with text
char inBytes[] = "The cute brown fox jumps over the dogs and bababanana and stuff.";
char *pFrom = NULL;
size_t nLeft = sizeof inBytes - 1;

// Simulated 'fopen()'.
bool myOpen( void ) { nLeft = strlen( pFrom = inBytes ); return true; }

// Simulated 'fread()'. (only 1 "file pointer in use")
size_t myRead( char *buf, size_t cnt ) {
    if( nLeft == 0 ) return 0; // EOF

    size_t give = nLeft <= cnt ? nLeft : cnt;

    memcpy( buf, pFrom, give );

    pFrom += give;
    nLeft -= give;

    return give;
}

// Look for string using different buffer sizes to prove target split functions
bool foobar( char srchfor[], int bufSize ) {
    bool found = false;
    int matched = 0;
    int lenWant = strlen( srchfor ); // # of chars to match

    // RAM buffer includes room for "wrapping"
    char *iblk = (char*)malloc( lenWant + bufSize );
    if( iblk == NULL ) {
        fprintf( stderr, "Malloc failed!!!\n" );
        exit( 1 );
    }

    // simulate loading sequential blocks into a fixed size buffer.
    myOpen();

    size_t inBuf = 0;
    char *pTo = iblk; // Read to location not always start of buffer
    while( ( inBuf += myRead( pTo, bufSize ) ) != 0 ) {

        printf( "'%.*s'  ", (int)inBuf, iblk ); // Show what's in buffer

        // The mill where matching is carried out
        for( size_t i = 0; i < inBuf && matched < lenWant; )
            if( srchfor[ matched ] == iblk[i] )
                matched++, i++;
            else if( matched )
                i -= matched - 1, matched = 0; // rewind a bit and try again
            else i++;

        // Lucky?
        if( matched == lenWant ) { printf( "Ahha!\n" ); found = true; break; }

        if( matched == 0 ) {
            pTo = iblk, inBuf = 0; // reset things
            printf( "nothing\n" );
        } else {
            // preserve what did match, and read location is offset
            printf( "got something\n" );
            memmove( iblk, iblk + inBuf - matched, matched );
            pTo += matched;
            inBuf = matched;
            matched = 0;
        }
    }
    free( iblk );

    return found;
}

int main() {

    char *target = "babanana";

    // Test with different buffer sizes (to split target across successive reads )
    for( int sz = 20; sz < 27; sz += 2 )
        printf( "bufSize = %d ... %s\n\n",
            sz, foobar( target, sz ) ? "Found!": "Not Found." );

    return 0;
}

Output:

'The cute brown fox j'  nothing
'umps over the dogs a'  nothing
'nd bababanana and st'  Ahha!
bufSize = 20 ... Found!

'The cute brown fox jum'  nothing
'ps over the dogs and b'  got something
'bababanana and stuff.'  Ahha!
bufSize = 22 ... Found!

'The cute brown fox jumps'  nothing
' over the dogs and babab'  got something
'babanana and stuff.'  Ahha!
bufSize = 24 ... Found!

'The cute brown fox jumps o'  nothing
'ver the dogs and bababanan'  got something
'babanana and stuff.'  Ahha!
bufSize = 26 ... Found!

EDIT3: That memmove() and the buffer size has been an annoyance for some time now.

Here's a version that takes one character of input at a time (fgetc() compatible), uses a heap buffer that is the same size as the target, uint8_t allows a search for binary targets, implements a circular buffer and has a lot of fiddley index manipulation. It's not Knuth, but neither am I...

size_t srch( uint8_t srch[], size_t nS, uint8_t targ[], size_t nT ) {
    uint8_t c, skip = 0, *q = (uint8_t*)malloc( nT );
    if( q == NULL ) {
        fprintf( stderr, "Malloc failed!!!\n" );
        exit( 1 );
    }

    size_t head = 0, tail = 0, ti = 0, tiS = 0, i = 0;
    while( ti < nT && i < nS ) {
        c = skip ? c : srch[i++]; // getc()
        skip = 0;
        if( c == targ[ti] ) {
            q[tail++] = c;
            tail %= nT;
            ti++;
        } else if( ti ) {
            skip = 1;
            do{
                while( --ti && q[ head = ++head%nT ] != targ[ 0 ] );
                for( tiS = 0; q[ (head+tiS)%nT ] == targ[ tiS ]; tiS++ );
            } while( tiS < ti );
        }
    }
    free( q );

    return ti == nT ? i - nT : nS; // found ? offset : impossible offset
}

int main() {
    char *in =
        "The cute brown fox jumps "
        "over the dogs babababananana stuff";
    size_t inSize = strlen( in );

    char *targets[] = {
        "The", "the", "ff",
        "babanana", "banana",
        "jumps", " cute",
        "orange",
    };
    int nTargs = sizeof targets/sizeof targets[0];

    for( int i = 0; i < nTargs; i++ ) {
        size_t val = strlen( targets[i] );

        val = srch( (uint8_t*)in, inSize, (uint8_t*)targets[i], val );

        if( val == inSize )
            printf( "%s ... not found\n", targets[i] );
        else
            printf( "%s ... %.15s\n", targets[i], in + val );
    }

    return 0;
}

Output

The ... The cute brown
the ... the dogs and ba
ff ... ff
babanana ... babananana and
banana ... bananana and st
jumps ... jumps over the
 cute ...  cute brown fox
orange ... not found

Probably better not to return `-1` (`int`) to the shell, see [POSIX return - EXIT STATUS](https://pubs.opengroup.org/onlinepubs/009695399/utilities/return.html) (return should be limited to 8-bits unsigned) That's why `stdlib.h` defines `EXIT_SUCCESS` as `0` and `EXIT_FAILURE` as `1`. — David C. Rankin, Aug 19 '22 at 03:45
I like where you are going with a page-size read. A good approach is to use the fixed buffer, read (as text), check if the return is less than page-size, if so, you are good in what you have, otherwise, then start the loop to `realloc()` 2X page-size and read. There are many good schemes to dynamically allocate/grow a memory block -- the key is to minimize the copies and number of times the memory must be grown. A fixed read and then dynamic if data exceeds the fixed size will avoid allocation in most circumstances with sysfs. — David C. Rankin, Aug 19 '22 at 04:10
@AndreasWenzel Just about to attack it ... AGAIN ... in light of your good testing... Naively optimistic... Typical programmer trait.... You are, of course, welcome to post your solution as you wish... I've got a notion (less than efficient) that I want to pursue.. Looking forward to you "blowing the next version out of the water"... again... :-) — Fe2O3, Aug 20 '22 at 04:11
FYI: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm That examines successive input bytes, so it will work without buffering. And it's O(n) in the length of the text being searched. — rici, Aug 20 '22 at 05:52
@rici Yes! First heard of KMP a few days ago when question came up on SO... Thank you... My head already spins with keeping track of this inefficient version's pointers and counters. :-) (And, where's the fun in using an algorithm developed by a genius? :-) — Fe2O3, Aug 20 '22 at 05:55
[Here](https://godbolt.org/z/nrbxKGrK7) is my version of the string search program that searches the file directly. I have tested it a bit and it seems to work, but feel free to point out some cases in which it fails, as I have done with you. :-) My code is quite complex, though. It has 4 levels of loop nesting and therefore requires lots of `goto`s to break out of the nested loops. — Andreas Wenzel, Aug 20 '22 at 06:40
I like it how you are automatically testing your program with different buffer sizes and displaying what is happening nicely. — Andreas Wenzel, Aug 20 '22 at 07:27
@AndreasWenzel I've applied the 'touch-ups' you suggested. Thank you. :-) Your code seems to my (too forgiving) eyes to be well though out. I suspect `memchr()` becomes a single instruction on modern CPUs as compared to my looping... Still, for the small 'mill' in my code, I'd be willing to wait a few extra mS for the result... :-) Again, thank you for your careful testing and feedback. "Humility is a hard won virtue." :-) — Fe2O3, Aug 20 '22 at 11:06
I have now deleted my comments pointing out the mistakes in your second code snippet, because they are no longer up to date, since you have corrected the mistakes. Since your answer is referring to my now-deleted comments, you may want to delete that reference. — Andreas Wenzel, Aug 20 '22 at 13:43
@AndreasWenzel I've modified the text, but still acknowledge that it was YOUR testing and feedback that turned SEVERAL optimistic yet insufficient versions into something that seems to get the job done (especially without a single `realloc()` in sight. :-) Thank you. :-) — Fe2O3, Aug 20 '22 at 21:07
I have now added [a second answer of mine](https://stackoverflow.com/a/73431557/12149471) to the question (which I posted as a separate answer), in order to provide an alternative solution, which reads only one character at a time. I believe that is the simplest solution to the problem, but not that efficient. — Andreas Wenzel, Aug 21 '22 at 04:29
@AndreasWenzel Slow day, so I finally got 'round to a version... well... It's highlighted as EDIT3 above... Time to put this one to bed... `:-)` — Fe2O3, Sep 21 '22 at 11:45

Andreas Wenzel · Answer 3 · 2022-08-21T05:34:40.587

In my other answer, I have answered your question on how to read the entire file into a memory buffer, in order to search it. However, in this answer, I will present an alternative solution to searching a file for a string, in which the file is searched directly, so that it is not necessary to read the entire file into memory.

In this program, I read a file character by character using getc and whenever I encounter the first character of the target string, I continue reading characters in order to compare these characters with the remaining characters of the target string. If any of these characters does not match, I push back all characters except the first one onto the input stream using ungetc, and then continue searching for the first character of the target string.

#include <stdio.h>
#include <stdlib.h>

int main( void )
{
    FILE *fp;
    int c;

    //define target string
    const char target[] = "banana";
    const size_t target_length = sizeof target - 1;

    //make sure that length of target string is at least 1
    _Static_assert(
        sizeof target >= 2,
        "target string must have at least one character"
    );

    //attempt to open file
    fp = fopen( "filename", "r" );
    if ( fp == NULL )
    {
        fprintf( stderr, "Error opening file!\n" );
        exit( EXIT_FAILURE );
    }

    //read one character per loop iteration
    while ( ( c = getc(fp) ) != EOF )
    {
        //compare first character
        if ( c == (unsigned char)target[0] )
        {
            //compare remaining characters
            for ( size_t i = 1; i < target_length; i++ )
            {
                if ( ( c = getc(fp) ) != (unsigned char)target[i] )
                {
                    //strings are not identical, so push back all
                    //characters

                    //push back last character
                    if ( ungetc( c, fp ) == EOF && c != EOF )
                    {
                        fprintf( stderr, "Unexpected error in ungetc!\n" );
                        goto cleanup;
                    }

                    //push back all other characters, except for
                    //the first character
                    for ( const char *p = target + i - 1; p != target; p-- )
                    {
                        if ( ungetc( *p, fp ) == EOF )
                        {
                            fprintf(
                                stderr,
                                "Error with function \"ungetc\"!\n"
                                "This error is probably due to this function\n"
                                "not supporting a sufficiently large\n"
                                "pushback buffer."
                            );
                            goto cleanup;
                        }
                    }

                    //go to next outer loop iteration
                    goto continue_outer_loop;
                }
            }

            //found target string
            printf( "Found!\n" );
            goto cleanup;
        }

    continue_outer_loop:
        continue;
    }

    //did not find target string
    printf( "Not found!\n" );

cleanup:
    fclose( fp );
}

However, this solution has one big problem. The size of the pushback buffer is only guaranteed to be a single character by ISO C. Although some platforms have pushback buffers up to 4 KiB, some platforms actually only support a single character.

Therefore, in order for this solution to be portable, it would be necessary to implement a sufficiently large pushback buffer yourself using your own version of ungetc and fgetc (which I call my_ungetc and my_fgetc):

#include <stdio.h>
#include <stdlib.h>

struct pushback_buffer
{
    char data[16384];
    char *end;
    char *p;

    FILE *fp;
};

int my_ungetc( int c, struct pushback_buffer *p )
{
    //verify that buffer is not full
    if ( p->p == p->data )
    {
        //buffer is full
        return EOF;
    }

    *--p->p = c;

    return 0;
}

int my_fgetc( struct pushback_buffer *p )
{
    //determine whether buffer is empty
    if ( p->p == p->end )
    {
        //pass on request to getc
        return getc( p->fp );
    }

    return *p->p++;
}

int main( void )
{
    static struct pushback_buffer pbb;
    int c;

    //define target string
    const char target[] = "banana";
    const size_t target_length = sizeof target - 1;

    //make sure that length of target string is at least 1
    _Static_assert(
        sizeof target >= 2,
        "target string must have at least one character"
    );

    //initialize pushback buffer except for "fp"
    pbb.end = pbb.data + sizeof pbb.data;
    pbb.p = pbb.end;

    //open file and write FILE * to pushback buffer
    pbb.fp = fopen( "filename", "r" );
    if ( pbb.fp == NULL )
    {
        fprintf( stderr, "Error opening file!\n" );
        exit( EXIT_FAILURE );
    }

    //read one character per loop iteration
    while ( ( c = my_fgetc(&pbb) ) != EOF )
    {
        //compare first character
        if ( c == (unsigned char)target[0] )
        {
            //compare remaining characters
            for ( size_t i = 1; i < target_length; i++ )
            {
                if ( ( c = my_fgetc(&pbb) ) != (unsigned char)target[i] )
                {
                    //strings are not identical, so push back all
                    //characters

                    //push back last character
                    if ( my_ungetc( c, &pbb ) == EOF && c != EOF )
                    {
                        fprintf( stderr, "Unexpected error in ungetc!\n" );
                        goto cleanup;
                    }

                    //push back all other characters, except for
                    //the first character
                    for ( const char *p = target + i - 1; p != target; p-- )
                    {
                        if ( my_ungetc( *p, &pbb ) == EOF )
                        {
                            fprintf(
                                stderr,
                                "Error with function \"ungetc\"!\n"
                                "This error is probably due to this function\n"
                                "not supporting a sufficiently large\n"
                                "pushback buffer."
                            );
                            goto cleanup;
                        }
                    }

                    //go to next outer loop iteration
                    goto continue_outer_loop;
                }
            }

            //found target string
            printf( "Found!\n" );
            goto cleanup;
        }

    continue_outer_loop:
        continue;
    }

    //did not find target string
    printf( "Not found!\n" );

cleanup:
    fclose( pbb.fp );
}

However, reading a file a single character at a time is not very efficient, especially on platforms which support multithreading, because this requires getc to acquire a lock every time. Some platforms offer platform-specific alternatives, such as getc_unlocked on POSIX-compliant platforms (e.g. Linux) and _getc_no_lock on Windows. But even when using these functions, reading one character at a time from the input stream will be rather slow. It would be more efficient to read a whole block of several kilobytes at once.

Here is a completely different solution of mine which reads a whole block at once, instead of one character at a time. However, this solution is rather complex, because it must handle two buffers at once and requires 4 levels of nested loops and multiple gotos to break out of these nested loops.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUFFER_SIZE 8192

struct buffer
{
    char data[BUFFER_SIZE];
    size_t valid_chars;
};

size_t read_next_block( char buffer[static BUFFER_SIZE], FILE *fp );

int main( void )
{
    //define target string
    const char target[] = "banana";
    const size_t target_length = sizeof target - 1;

    //verify that length of target string is at least 1
    _Static_assert(
        sizeof target >= 2,
        "target string must have at least one character"
    );

    //verify that target string is not so long that
    //more than two buffers would be required
    _Static_assert(
        BUFFER_SIZE > sizeof target,
        "target string too long"
    );

    //other declarations
    FILE *fp;
    struct buffer buffers[2];
    struct buffer *current = NULL, *next = NULL;

    //attempt to open file
    fp = fopen( "filename", "r" );
    if ( fp == NULL )
    {
        fprintf( stderr, "Error opening file!\n" );
        exit( EXIT_FAILURE );
    }

    //read one block per loop iteration
    do
    {
        char *p, *q;
        size_t chars_left;

        if ( next == NULL )
        {
            //use the first buffer
            current = &buffers[0];

            //load the next block
            current->valid_chars = read_next_block( current->data, fp );
        }
        else
        {
            current = next;
            next = NULL;
        }

        p = current->data;
        chars_left = current->valid_chars;

        //search for next occurance of starting character
        while (
            chars_left != 0
            &&
            ( q = memchr( p, target[0], chars_left ) ) != NULL
        )
        {
            chars_left -= q - p;
            p = q;

            for ( size_t i = 1; i < target_length; i++ )
            {
                //swap to next block, if necessary
                if ( i == chars_left )
                {
                    //check whether we have reached end-of-file
                    if ( current->valid_chars != BUFFER_SIZE )
                    {
                        goto no_match;
                    }

                    //load next block, if necessary
                    if ( next == NULL )
                    {
                        //make "next" point to the other buffer
                        next = current == &buffers[0] ? &buffers[1] : &buffers[0];

                        //load the next block
                        next->valid_chars = read_next_block( next->data, fp );
                    }

                    for ( size_t j = 0; i < target_length; i++, j++ )
                    {
                        //check whether we have reached end-of-file
                        if ( j == next->valid_chars )
                        {
                            //the strings don't match
                            goto no_match;
                        }

                        if ( next->data[j] != target[i] )
                        {
                            //the strings don't match
                            goto no_match;
                        }
                    }

                    //the strings match
                    goto match;
                }

                //go to next outer loop iteration if the
                //strings do not match
                if ( p[i] != target[i] )
                {
                    //the strings don't match
                    goto no_match;
                }
            }

            //the strings match
            goto match;

        no_match:
            
            p++;
            chars_left--;
        }

    } while ( current->valid_chars == BUFFER_SIZE );

    //no match was found
    printf( "Not found!\n" );
    goto cleanup;

match:

    //the strings match
    printf( "Found!\n" );
    goto cleanup;

cleanup:
    fclose( fp );
}

size_t read_next_block( char buffer[static BUFFER_SIZE], FILE *fp )
{
    size_t bytes_read;

    bytes_read = fread( buffer, 1, BUFFER_SIZE, fp );

    if ( bytes_read == 0 && ferror( fp ) )
    {
        fprintf( stderr, "Input error!\n" );
        exit( EXIT_FAILURE );
    }

    return bytes_read;
}

I really don't want to be a pain... The "raw getc()" version has the file open as a FILE stream... Rather than "pushback", would you consider `fseek( )` with a negative offset and SEEK_CUR as a way to rewind? ... How many hours at this one problem??? :-) — Fe2O3, Aug 21 '22 at 06:52
@Fe2O3: According to [§7.21.9.2 ¶4 of the ISO C11 standard](http://port70.net/~nsz/c/c11/n1570.html#7.21.9.2p4), calling `fseek` with a negative offset on a text stream will invoke undefined behavior (POSIX defines this behavior, though). However, it would be possible to call `ftell` to obtain the offset of the second character and then, at a later time, to `fseek` to that offset using `SEEK_SET`. So yes, you have a good idea that would work. But I'm afraid that seeking in a file would cause the buffers of the stream to be flushed, which would be bad for performance. — Andreas Wenzel, Aug 21 '22 at 06:59
Thanks. If you still want to play with this problem, another approach that I considered was a "circular buffer", or perhaps a "queue" sized to the target string length. While 'next char seen' is "good", the queue gets longer (beginning with 1). If a bad char joins the back of the queue while it's too short (not matching what is wanted), characters are 'released' from the front until the front of the queue matches target[ 0 ]... For most 'mismatches', the queue length would be zero... Many ways to skin a cat! :-) — Fe2O3, Aug 21 '22 at 08:24

score 0 · Answer 4 · answered Aug 18 '22 at 21:15

If the kernel is creating the file as you read it and there is a risk that the size of it will be different the next time you read it, then your only real bet is to read it into a buffer before you know how large the file is. Start by allocating a LARGE buffer - big enough that it SHOULD accept the entire file - then call read() to get (at most) that many bytes. If there's still more to be read, you can realloc() the buffer you were writing into. Repeat the realloc() as often as necessary.

Read sysfs file into buffer for string comparison, without having to open it twice

4 Answers4

Linked