0

How to read text from a file into a dynamic array of characters? I found a way to count the number of characters in a file and create a dynamic array, but I can't figure out how to assign characters to the elements of the array?

FILE *text;
char* Str;
int count = 0;
char c;
text = fopen("text.txt", "r");
while(c = (fgetc(text))!= EOF)
{
  count ++;
}
Str = (char*)malloc(count * sizeof(char));

fclose(text);

Sasha
  • 19
  • 5
  • `fgetc()` returns `int`, not `char`. Truncating the value returned from `fgetc()` to a `char` can cause your code to improperly identify `EOF`, because `EOF` is a value specifically chosen so that it **doesn't** fit into a `char` value. – Andrew Henle Mar 27 '20 at 10:52

3 Answers3

2

There is no portable, standard-conforming way in C to know in advance how may bytes may be read from a FILE stream.

First, the stream might not even be seekable - it can be a pipe or a terminal or even a socket connection. On such streams, once you read the input it's gone, never to be read again. You can push back one char value, but that's not enough to be able to know how much data remains to be read, or to reread the entire stream.

And even if the stream is to a file that you can seek on, you can't use fseek()/ftell() in portable, strictly-conforming C code to know how big the file is.

If it's a binary stream, you can not use fseek() to seek to the end of the file - that's explicitly undefined behavior per the C standard:

... A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

Footnote 268 even says:

Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream ...

So you can't portably use fseek() in a binary stream.

And you can't use ftell() to get a byte count for a text stream. Per the C standard again:

For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.

Systems do exist where the value returned from ftell() is nothing like a byte count.

The only portable, conforming way to know how many bytes you can read from a stream is to actually read them, and you can't rely on being able to read them again.

If you want to read the entire stream into memory, you have to continually reallocate memory, or use some other dynamic scheme.

This is a very inefficient but portable and strictly-conforming way to read the entire contents of a stream into memory (all error checking and header files are omitted for algorithm clarity and to keep the vertical scrollbar from appearing - it really needs error checking and will need the proper header files):

// get input stream with `fopen()` or some other manner
FILE *input = ...

size_t count = 0;
char *data = NULL;

for ( ;; )
{
    int c = fgetc( input );
    if ( c == EOF )
    {
        break;
    }

    data = realloc( data, count + 1 );

    data[ count ] = c;

    count++;
}

// optional - terminate the data with a '\0'
// to treat the data as a C-style string
data = realloc( data, count + 1 );
data[ count ] = '\0';
count++;

That will work no matter what the stream is.

On a POSIX-style system such as Linux, you can use fileno() and fstat() to get the size of a file (again, all error checking and header files are omitted):

char *data = NULL;
FILE *input = ...

int fd = fileno( input );

struct stat sb;

fstat( fd, &sb );

if ( S_ISREG( sb.st_mode ) )
{
    // sb.st_size + 1 for C-style string
    char *data = malloc( sb.st_size + 1 );
    data[ sb.st_size ] = '\0';
}

// now if data is not NULL you can read into the buffer data points to
// if data is NULL, see above code to read char-by-char

// this tries to read the entire stream in one call to fread()
// there are a lot of other ways to do this
size_t totalRead = 0;
while ( totalRead < sb.st_size )
{
    size_t bytesRead = fread( data + totalRead, 1, sb.st_size - totalRead, input );

    totalRead += bytesRead;
}

The above could should work on Windows, too. You may get some compiler warnings or have to use _fileno(), _fstat() and struct _stat instead, too.*

You may also need to define the S_ISREG() macro on Windows:

#define S_ISREG(m) (((m) & S_IFMT) == S_IFREG)

* that's _fileno(), _fstat(), and struct _stat without the hyperlink underline-munge.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
1

For a binary file, you can use fseek and ftell to know the size without reading the file, allocate the memory and then read everything:

...
text = fopen("text.txt", "r");
fseek(txt, 0, SEEK_END);
char *ix = Str = malloc(ftell(txt);
while(c = (fgetc(text))!= EOF)
{
  ix++ = c;
}
count = ix - Str;       // get the exact count...
...

For a text file, on a system that has a multi-byte end of line (like Windows which uses \r\n), this will allocate more bytes than required. You could of course scan the file twice, first time for the size and second for actually reading the characters, but you can also just ignore the additional bytes, or you could realloc:

...
count = ix - Str;
Str = realloc(Str, count);
...

Of course for a real world program, you should control the return values of all io and allocation functions: fopen, fseek, fteel, malloc and realloc...

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • *For a binary file, you can use fseek and ftell to know the size without reading the file* That is true only if your operating system provides guarantees beyond what standard C does. Under strictly conforming C, `fseek()` to the end of a binary file is explicitly undefined behavior. Per [footnote 268 of the C11 standard](https://port70.net/~nsz/c/c11/n1570.html#note268): "Setting the file position indicator to end-of-file, as with `fseek(file, 0, SEEK_END)`, has undefined behavior for a binary stream ..." This also assumes the file is seekable - it doesn't have to be. – Andrew Henle Mar 27 '20 at 10:49
  • For a text file, the [value from `ftell()` in strictly-confomant C also has no relation to a byte count](https://port70.net/~nsz/c/c11/n1570.html#7.21.9.4p2): "For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read." On [z/OS](https://en.wikipedia.org/wiki/Z/OS), for example. – Andrew Henle Mar 27 '20 at 10:56
0

To just do what you asked for, you would have to read the whole file again:

...
// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
  printf("woops - something bad happened\n");
}

// do stuff with it
// ...

fclose(text);

But your string is not null terminated this way. That will get you in some trouble if you try to use some common string functions like strlen.

To properly null terminate your string you would have to allocate space for one additional character and set that last one to '\0':

...
// allocate count + 1 (for the null terminator) 
Str = (char*)malloc((count + 1) * sizeof(char));    

// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
  printf("woops - something bad happened\n");
}
// add null terminator
Str[count] = '\0';

// do stuff with it
// ...

fclose(text);

Now if you want know the number of characters in the file without counting them one by one, you could get that number in a more efficient way:

...
text = fopen("text.txt", "r");

// seek to the end of the file
fseek(text, 0L, SEEK_END);
// get your current position in that file
count = ftell(text)

// allocate count + 1 (for the null terminator) 
Str = (char*)malloc((count + 1) * sizeof(char));    
...

Now bring this in a more structured form:

// open file
FILE *text = fopen("text.txt", "r");

// seek to the end of the file
fseek(text, 0L, SEEK_END);
// get your current position in that file
ssize_t count = ftell(text)

// allocate count + 1 (for the null terminator) 
char* Str = (char*)malloc((count + 1) * sizeof(char));    

// go back to the beginning
fseek(text, 0L, SEEK_SET);
// read
ssize_t readsize = fread(Str, sizeof(char), count, text);
if(readsize != count) {
  printf("woops - something bad happened\n");
}

fclose(text);

// add null terminator
Str[count] = '\0';

// do stuff with it
// ...

Edit:

As Andrew Henle pointed out not every FILE stream is seekable and you can't even rely on being able to read the file again (or that the file has the same length/content when reading it again). Even though this is the accepted answer, if you don't know in advance what kind of file stream you're dealing with, his solution is definitely the way to go.

Stefan Riedel
  • 796
  • 4
  • 15