Accessing the memory buffer out of bounds
The line
*(dict + i) = getc(fp);
will access dict
out of bounds, for the following reasons:
The variable dict
is a pointer to a memory buffer of size
bytes. The type of this pointer is int *
. However, that does not mean that there is room for size
elements of type int
in the memory buffer. On most platforms, an int
has a size of 4 bytes. Therefore, the memory buffer only has room for size / 4
elements of type int
.
For example, if size
is 100
, then dict
will be pointing to a memory buffer of 100
bytes, which can only store 25
int
elements. In that case, valid indexes range from 0
to 24
.
However, in your loop
for (i = 0; i <= size; ++i)
your index is going from 0
to size
inclusive, which, if we stay with our example of size
being 100
, would be from 0
to 100
including 100
. However, as previously stated, in that example, valid indexes are from 0
to 24
, because there is only room for 25
int
elements. Since the highest index you are using is 100
instead of 24
, you are accessing the array out of bounds.
In order to fix this, you should do two things:
You should change the line
int *dict = malloc(size);
to
char *dict = malloc(size);
so that, if we stay with the example of size
being 100
, you can now store 100
elements of type char
in the memory buffer, instead of only 25
elements of type int
.
You should also change the line
for (i = 0; i <= size; ++i)
to
for (i = 0; i < size; ++i)
so that, if we stay with the example of size
being 100
, it uses the indexes 0
to 99
instead of 0
to 100
, because 100
is not a valid index and would be out of bounds.
Determining the length of a file
With the lines
fseek(fp, 0, SEEK_END);
size = ftell(fp);
you seem to be attempting to determine the length of the file. However, in ISO C, there is no reliable way to determine the length of a file without reading the entire file, because
- when the stream is opened in text mode, the return value of the function
ftell
is unspecified and only meaningful as input to the function fseek
(i.e. it does not necessarily specify the length of the file), and
- when the stream is opened in binary mode, a function call to
fseek
with the argument SEEK_END
is not guaranteed to be meaningfully supported.
Therefore, even if those lines of code that you are using happen to work on your platform, they are not guaranteed to work on all platforms.
Since the only way to determine the file size that is guaranteed to work on all ISO C compliant platforms is to read the entire file, in the following section, I will provide such a solution, which reads the entire file once and grows the memory buffer as necessary.
Solution which is guaranteed to work on all platforms
The following solution does not rely on any platform-specific behavior and is guaranteed to work on all ISO C compliant platforms.
#include <stdio.h>
#include <stdlib.h>
#define INITIAL_BUFFER_CAPACITY 512
int main( int argc, char *argv[] )
{
FILE *fp;
char *data;
size_t data_size = 0, data_capacity = INITIAL_BUFFER_CAPACITY;
int c;
//verify that argv[1] exists
if ( argc < 2 )
{
fprintf( stderr, "This program requires an argument.\n" );
exit( EXIT_FAILURE );
}
//attempt to open file
fp = fopen( argv[1], "r" );
if ( fp == NULL )
{
fprintf( stderr, "Error opening file!\n" );
exit( EXIT_FAILURE );
}
//allocate memory for initial memory buffer
data = malloc( data_capacity );
if ( data == NULL )
{
fprintf( stderr, "Memory allocation error!\n" );
exit( EXIT_FAILURE );
}
//read the entire file one character per loop iteration
while ( ( c = getc( fp ) ) != EOF )
{
//grow the buffer, if necessary
if ( data_size == data_capacity )
{
data_capacity *= 2;
data = realloc( data, data_capacity );
if ( data == NULL )
{
fprintf( stderr, "Memory allocation error!\n" );
exit( EXIT_FAILURE );
}
}
//write the character to the memory buffer and update the size
data[data_size++] = c;
}
//print the entire file contents as text
fwrite( data, data_size, 1, stdout );
//cleanup
free( data );
fclose( fp );
return EXIT_SUCCESS;
}
Here is a more efficient solution which uses fread
to read as much as possible at once, instead of only one character at a time.
#include <stdio.h>
#include <stdlib.h>
#define INITIAL_BUFFER_CAPACITY 512
int main( int argc, char *argv[] )
{
FILE *fp;
char *data = NULL;
size_t data_size = 0, data_capacity = INITIAL_BUFFER_CAPACITY;
//verify that argv[1] exists
if ( argc < 2 )
{
fprintf( stderr, "This program requires an argument.\n" );
exit( EXIT_FAILURE );
}
//attempt to open file
fp = fopen( argv[1], "r" );
if ( fp == NULL )
{
fprintf( stderr, "Error opening file!\n" );
exit( EXIT_FAILURE );
}
//fill the memory buffer in every loop iteration and expand it
//for the next read
for (;;) //infinite loop, equivalent to while(1)
{
size_t bytes_read, bytes_to_read;
//grow the buffer to desired capacity
data = realloc( data, data_capacity );
if ( data == NULL )
{
fprintf( stderr, "Memory allocation error!\n" );
exit( EXIT_FAILURE );
}
//calculate number of bytes to read in next read operation
bytes_to_read = data_capacity - data_size;
//attempt to fill the read buffer
bytes_read = fread( data + data_size, 1, bytes_to_read, fp );
//update size of data
data_size += bytes_read;
//break out of the infinite loop if the read buffer could
//not be filled entirely
if ( bytes_read != bytes_to_read )
break;
//change desired capacity for next loop iteration
data_capacity *= 2;
}
//print the entire file contents as text
fwrite( data, data_size, 1, stdout );
//cleanup
free( data );
fclose( fp );
return EXIT_SUCCESS;
}