Here is a model demonstrating how simple what I'm trying to do is:
int main()
{
char *file_content[];
load_file_to_array(file_content);
}
void load_file_to_array(char *to_load[]){
// something to update file_content
}
Basic usage
Alright, here's an approach that allows you do what you want with 100% static memory allocation, which I prefer whenever possible for many reasons (some listed below).
Example usage, to show how simple it is to use:
int main()
{
// Make this huge struct `static` so that the buffers it contains will be
// `static` so that they are neither on the stack **nor** the heap, thereby
// preventing stack overflow in the event you make them larger than the stack
// size, which is ~7.4 MB for Linux. See my answer here:
// https://stackoverflow.com/a/64085509/4561887.
static file_t file;
const char FILENAME[] = "path/to/some/file.txt";
// OR:
// const char FILENAME[] = __FILE__; // read this source code file itself
file_store_path(&file, FILENAME);
printf("Loading file at path \"%s\".\n", file.path);
// open the file and copy its entire contents into the `file` object
file_load(&file);
printf("Printing the entire file:\n");
file_print_all(&file);
printf("\n");
printf("Printing just this 1 line number:\n");
file_print_line(&file, 256);
printf("\n");
printf("Printing 4 lines starting at this line number:\n");
file_print_lines(&file, 256, 4);
printf("\n");
return 0;
}
How does it work?
I don't use any 2D arrays or anything. Rather, I create a static 1D array of chars as char file_str[MAX_NUM_CHARS];
, which contains all chars from the entire file, and I create a static 1D array of char *
(ptr to char
) as char* line_array[MAX_NUM_LINES];
, which contains pointers to the first char of each line in the entire file, where the chars these pointers are pointing to are inside the file_str
array. Then, I read the file one char at a time into the file_str
array. Each time I see a \n
newline char I know that the next char is the start of a new line, so I add a ptr to that char into the line_array
array.
To print the file back out I iterate through the line_array
array, printing all chars from each line, one line at a time.
One could choose to get rid of the line_array
array entirely and just use the file_str
character array. You could still choose a line to print and print it. The drawback of that approach, however, is that finding the start of the line to print would take O(n) time since you'd have to start at the first char
in the file and read all the way through it to the line of interest, counting lines by counting the number of newline \n
chars you see. My approach, on the other hand, takes O(1) time and indexes straight to the front of the line of interest through a simple index into the line_array
array.
The arrays above are stored in a file_t
struct, defined as follows:
#define MAX_NUM_LINES 10000UL
#define MAX_NUM_CHARS (MAX_NUM_LINES*200UL) // 2 MB
#define MAX_PATH_LEN (1000)
typedef struct file_s
{
/// The path to the file to open.
char path[MAX_PATH_LEN];
/// All characters read from the file.
char file_str[MAX_NUM_CHARS]; // array of `char`
/// The total number of chars read into the `file_str` string, including
// null terminator.
size_t num_chars;
/// A ptr to each line in the file.
char* line_array[MAX_NUM_LINES]; // array of `char*` (ptr to char)
/// The total number of lines in the file, and hence in the `line_array`
// above.
size_t num_lines;
} file_t;
Here are some stats I print out about file_t
at the beginning of my program:
The size of the file_t
struct is 2081016 bytes (2.081016 MB; 1.984612 MiB).
Max file size that can be read into this struct is 2000000 bytes or 10000 lines, whichever limit is hit first.
The file
object is static so that if you ever make your file_t
struct really huge to handle massive files (many gigabytes even, if you wanted), you don't have a stack overflow, since thread stack sizes are limited to ~7.4 MB for Linux. See my answer here: C/C++ maximum stack size of program.
I use static memory allocation, not dynamic, for these reasons:
- It's faster at run-time since lots of dynamic memory allocation can add substantial overhead.
- It's deterministic to use static memory allocation, making this implementation style good for
safety-critical, memory-constrained, real-time, deterministic, embedded devices and programs.
- It can access the first character of any line in the file in O(1) time via a simple index into
an array.
- It's extensible via dynamic memory allocation if needed (more on this below).
The file is opened and loaded into the file
object via the file_load()
function. That function looks like this, including with robust error handling:
/// Read all characters from a file on your system at the path specified in the
/// file object and copy this file data **into** the passed-in `file` object.
void file_load(file_t* file)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
FILE* fp = fopen(file->path, "r");
if (fp == NULL)
{
printf("ERROR in function %s(): Failed to open file (%s).\n",
__func__, strerror(errno));
return;
}
// See: https://en.cppreference.com/w/c/io/fgetc
int c; // note: int, not char, required to handle EOF
size_t i_write_char = 0;
size_t i_write_line = 0;
bool start_of_line = true;
const size_t I_WRITE_CHAR_MAX = ARRAY_LEN(file->file_str) - 1;
const size_t I_WRITE_LINE_MAX = ARRAY_LEN(file->line_array) - 1;
while ((c = fgetc(fp)) != EOF) // standard C I/O file reading loop
{
// 1. Write the char
if (i_write_char > I_WRITE_CHAR_MAX)
{
printf("ERROR in function %s(): file is full (i_write_char = "
"%zu, but I_WRITE_CHAR_MAX is only %zu).\n",
__func__, i_write_char, I_WRITE_CHAR_MAX);
break;
}
file->file_str[i_write_char] = c;
// 2. Write the ptr to the line
if (start_of_line)
{
start_of_line = false;
if (i_write_line > I_WRITE_LINE_MAX)
{
printf("ERROR in function %s(): file is full (i_write_line = "
"%zu, but I_WRITE_LINE_MAX is only %zu).\n",
__func__, i_write_line, I_WRITE_LINE_MAX);
break;
}
file->line_array[i_write_line] = &(file->file_str[i_write_char]);
i_write_line++;
}
// end of line
if (c == '\n')
{
// '\n' indicates the end of a line, so prepare to start a new line
// on the next iteration
start_of_line = true;
}
i_write_char++;
}
file->num_chars = i_write_char;
file->num_lines = i_write_line;
fclose(fp);
}
Extend it via dynamic memory allocation
In many cases, the above static memory allocation above is sufficient. It is good to avoid dynamic memory allocation whenever possible. However, if you're going to use dynamic memory allocation, do not allocate tons of small chunks! It's better to allocate one or two big chunks instead. Speed test this to prove it--always verify what people say--myself included! As evidence of this, though, read this too and look at the plot: https://github.com/facontidavide/CPP_Optimizations_Diary/blob/master/docs/reserve.md. "NoReserve" is with lots of small dynamic memory allocations and subsequent memory copies of existing data into these new memory locations, and "WithReserve" is with one single large dynamic memory allocation up-front instead:

Sometimes dynamic memory allocation is prudent, however. How can we best do it? Let's say you needed to open 1000 files and have them all open at the same time, with file sizes ranging from a few bytes to a few GB. In that case, using static memory allocation for all 1000 files would be not only bad, but nearly impossible (at the least, awkward and space-inefficient). What you should do instead is make file_t
large enough to hold even the largest file of a few GB in size, but then statically allocate one instance of it to use as a buffer, and then do this: after opening each file (one at a time) and loading it into a single file
object you have, dynamically malloc()
the exact amount of memory you need for that file, and strncpy()
or memcpy()
all the data over from that initial static file
object into a dynamic object which has the exact amount of memory the file needs, with no waste. In this way, the static file
object is just there as a placeholder, or buffer, to read the file, counting the number of bytes and number of lines at the same time, so you can then dynamically allocate just enough memory for those bytes and lines.
Full code
Here is the full code. This program opens up the source code file itself and prints it all out, printing line numbers at the start of each line just for fun.
read_file_into_c_string_and_array_of_lines.c, from my eRCaGuy_hello_world repo:
#include <errno.h>
#include <stdbool.h> // For `true` (`1`) and `false` (`0`) macros in C
#include <stdint.h> // For `uint8_t`, `int8_t`, etc.
#include <stdio.h> // For `printf()`
#include <string.h> // for `strerror()`
// Get the number of elements in any C array
// - Usage example: [my own answer]:
// https://arduino.stackexchange.com/questions/80236/initializing-array-of-structs/80289#80289
#define ARRAY_LEN(array) (sizeof(array) / sizeof(array[0]))
/// Max and min gcc/clang **statement expressions** (safer than macros) for C. By Gabriel Staples.
/// See: https://stackoverflow.com/a/58532788/4561887
#define MAX(a, b) \
({ \
__typeof__(a) _a = (a); \
__typeof__(b) _b = (b); \
_a > _b ? _a : _b; \
})
#define MIN(a, b) \
({ \
__typeof__(a) _a = (a); \
__typeof__(b) _b = (b); \
_a < _b ? _a : _b; \
})
/// Bytes per megabyte
#define BYTES_PER_MB (1000*1000)
/// Bytes per mebibyte
#define BYTES_PER_MIB (1024*1024)
/// Convert bytes to megabytes
#define BYTES_TO_MB(bytes) (((double)(bytes))/BYTES_PER_MB)
/// Convert bytes to mebibytes
#define BYTES_TO_MIB(bytes) (((double)(bytes))/BYTES_PER_MIB)
#define MAX_NUM_LINES 10000UL
#define MAX_NUM_CHARS (MAX_NUM_LINES*200UL) // 2 MB
#define MAX_PATH_LEN (1000)
typedef struct file_s
{
/// The path to the file to open.
char path[MAX_PATH_LEN];
/// All characters read from the file.
char file_str[MAX_NUM_CHARS]; // array of `char`
/// The total number of chars read into the `file_str` string, including
// null terminator.
size_t num_chars;
/// A ptr to each line in the file.
char* line_array[MAX_NUM_LINES]; // array of `char*` (ptr to char)
/// The total number of lines in the file, and hence in the `line_array`
// above.
size_t num_lines;
} file_t;
/// Copy the file path pointed to by `path` into the `file_t` object.
void file_store_path(file_t* file, const char *path)
{
if (file == NULL || path == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
strncpy(file->path, path, sizeof(file->path));
}
/// Print the entire line at 1-based line number `line_number` in file `file`, including the
/// '\n' at the end of the line.
void file_print_line(const file_t* file, size_t line_number)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
// Ensure we don't read outside the `file->line_array`
if (line_number > file->num_lines)
{
printf("ERROR in function %s(): line_number (%zu) is too large (file->num_lines = %zu).\n",
__func__, line_number, file->num_lines);
return;
}
size_t i_line = line_number - 1;
char* line = file->line_array[i_line];
if (line == NULL)
{
printf("ERROR in function %s(): line_array contains NULL ptr for line_number = %zu at "
"index = %zu.\n", __func__, line_number, i_line);
return;
}
// print all chars in the line
size_t i_char = 0;
while (true)
{
if (i_char > file->num_chars - 1)
{
// outside valid data
break;
}
char c = line[i_char];
if (c == '\n')
{
printf("%c", c);
break;
}
else if (c == '\0')
{
// null terminator
break;
}
printf("%c", c);
i_char++;
}
}
/// Print `num_lines` number of lines in a file, starting at 1-based line number `first_line`,
/// and including the '\n' at the end of each line.
/// At the start of each line, the line number is also printed, followed by a colon (:).
void file_print_lines(const file_t* file, size_t first_line, size_t num_lines)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
if (num_lines == 0 || file->num_lines == 0)
{
printf("ERROR in function %s(): num_lines passed in == %zu; file->num_lines = %zu.\n",
__func__, num_lines, file->num_lines);
return;
}
// Ensure we don't read outside the valid data
size_t last_line = MIN(first_line + num_lines - 1, file->num_lines);
// printf("last_line = %zu\n", last_line); // DEBUGGING
for (size_t line_number = first_line; line_number <= last_line; line_number++)
{
printf("%4lu: ", line_number);
file_print_line(file, line_number);
}
}
/// Print an entire file.
void file_print_all(const file_t* file)
{
printf("num_chars to print = %zu\n", file->num_chars);
printf("num_lines to print = %zu\n", file->num_lines);
printf("========== FILE START ==========\n");
file_print_lines(file, 1, file->num_lines);
printf("=========== FILE END ===========\n");
}
/// Read all characters from a file on your system at the path specified in the
// file object.
void file_load(file_t* file)
{
if (file == NULL)
{
printf("ERROR in function %s(): NULL ptr.\n", __func__);
return;
}
FILE* fp = fopen(file->path, "r");
if (fp == NULL)
{
printf("ERROR in function %s(): Failed to open file (%s).\n",
__func__, strerror(errno));
return;
}
// See: https://en.cppreference.com/w/c/io/fgetc
int c; // note: int, not char, required to handle EOF
size_t i_write_char = 0;
size_t i_write_line = 0;
bool start_of_line = true;
const size_t I_WRITE_CHAR_MAX = ARRAY_LEN(file->file_str) - 1;
const size_t I_WRITE_LINE_MAX = ARRAY_LEN(file->line_array) - 1;
while ((c = fgetc(fp)) != EOF) // standard C I/O file reading loop
{
// 1. Write the char
if (i_write_char > I_WRITE_CHAR_MAX)
{
printf("ERROR in function %s(): file is full (i_write_char = "
"%zu, but I_WRITE_CHAR_MAX is only %zu).\n",
__func__, i_write_char, I_WRITE_CHAR_MAX);
break;
}
file->file_str[i_write_char] = c;
// 2. Write the ptr to the line
if (start_of_line)
{
start_of_line = false;
if (i_write_line > I_WRITE_LINE_MAX)
{
printf("ERROR in function %s(): file is full (i_write_line = "
"%zu, but I_WRITE_LINE_MAX is only %zu).\n",
__func__, i_write_line, I_WRITE_LINE_MAX);
break;
}
file->line_array[i_write_line] = &(file->file_str[i_write_char]);
i_write_line++;
}
// end of line
if (c == '\n')
{
// '\n' indicates the end of a line, so prepare to start a new line
// on the next iteration
start_of_line = true;
}
i_write_char++;
}
file->num_chars = i_write_char;
file->num_lines = i_write_line;
fclose(fp);
}
// Make this huge struct `static` so that the buffers it contains will be `static` so that they are
// neither on the stack **nor** the heap, thereby preventing stack overflow in the event you make
// them larger than the stack size, which is ~7.4 MB for Linux, and are generally even smaller for other systems. See my answer here:
// https://stackoverflow.com/a/64085509/4561887.
static file_t file;
// int main(int argc, char *argv[]) // alternative prototype
int main()
{
printf("The size of the `file_t` struct is %zu bytes (%.6f MB; %.6f MiB).\n"
"Max file size that can be read into this struct is %zu bytes or %lu lines, whichever "
"limit is hit first.\n\n",
sizeof(file_t), BYTES_TO_MB(sizeof(file_t)), BYTES_TO_MIB(sizeof(file_t)),
sizeof(file.file_str), ARRAY_LEN(file.line_array));
const char FILENAME[] = __FILE__;
file_store_path(&file, FILENAME);
printf("Loading file at path \"%s\".\n", file.path);
// open the file and copy its entire contents into the `file` object
file_load(&file);
printf("Printing the entire file:\n");
file_print_all(&file);
printf("\n");
printf("Printing just this 1 line number:\n");
file_print_line(&file, 256);
printf("\n");
printf("Printing 4 lines starting at this line number:\n");
file_print_lines(&file, 256, 4);
printf("\n");
// FOR TESTING: intentionally cause some errors by trying to print some lines for an unpopulated
// file object. Example errors:
// 243: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 243 at index = 242.
// 244: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 244 at index = 243.
// 245: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 245 at index = 244.
// 246: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 246 at index = 245.
// Note: for kicks (since I didn't realize this was possible), I'm also using the variable name
// `$` for this `file_t` object.
printf("Causing some intentional errors here:\n");
file_t $;
file_print_lines(&$, 243, 4);
return 0;
}
Build and run commands:
- In C:
mkdir -p bin && gcc -Wall -Wextra -Werror -O3 -std=c17 \
read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
- In C++
mkdir -p bin && g++ -Wall -Wextra -Werror -O3 -std=c++17 \
read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
Sample run cmd and output (most lines in the middle deleted, since the bulk of the output is just a print out of the source code itself):
eRCaGuy_hello_world/c$ gcc -Wall -Wextra -Werror -O3 -std=c17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
The size of the `file_t` struct is 2081016 bytes (2.081016 MB; 1.984612 MiB).
Max file size that can be read into this struct is 2000000 bytes or 10000 lines, whichever limit is hit first.
Loading file at path "read_file_into_c_string_and_array_of_lines.c".
Printing the entire file:
num_chars to print = 15603
num_lines to print = 425
========== FILE START ==========
1: /*
2: This file is part of eRCaGuy_hello_world: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world
3:
4: GS
5: 2 Mar. 2022
6:
7: Read a file in C into a C-string (array of chars), while also placing pointers to the start of each
8: line into another array of `char *`. This way you have all the data plus the
9: individually-addressable lines. Use static memory allocation, not dynamic, for these reasons:
10: 1. It's a good demo.
11: 1. It's faster at run-time since lots of dynamic memory allocation can add substantial overhead.
12: 1. It's deterministic to use static memory allocation, making this implementation style good for
13: safety-critical, memory-constrained, real-time, deterministic, embedded devices and programs.
14: 1. It can access the first character of any line in the file in O(1) time via a simple index into
15: an array.
16: 1. It's extensible via dynamic memory allocation if needed.
17:
18: STATUS: works!
19:
20: To compile and run (assuming you've already `cd`ed into this dir):
21: 1. In C:
22: ```bash
23: gcc -Wall -Wextra -Werror -O3 -std=c17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
24: ```
25: 2. In C++
26: ```bash
27: g++ -Wall -Wextra -Werror -O3 -std=c++17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
28: ```
29:
.
.
.
406: // 300:
407: // 301: eRCaGuy_hello_world/c$ g++ -Wall -Wextra -Werror -O3 -std=c++17 read_file_into_c_string_and_array_of_lines.c -o bin/a && bin/a
408: // 302:
409: // 303:
410: // 304: */
411: // =========== FILE END ===========
412: //
413: // Printing just one line now:
414: // 255:
415: //
416: // Causing some intentional errors here:
417: // 243: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 243 at index = 242.
418: // 244: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 244 at index = 243.
419: // 245: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 245 at index = 244.
420: // 246: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 246 at index = 245.
421: //
422: //
423: // OR, in C++:
424: //
425: // [SAME AS THE C OUTPUT]
=========== FILE END ===========
Printing just this 1 line number:
file->line_array[i_write_line] = &(file->file_str[i_write_char]);
Printing 4 lines starting at this line number:
256: file->line_array[i_write_line] = &(file->file_str[i_write_char]);
257: i_write_line++;
258: }
259:
Causing some intentional errors here:
243: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 243 at index = 242.
244: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 244 at index = 243.
245: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 245 at index = 244.
246: ERROR in function file_print_line(): line_array contains NULL ptr for line_number = 246 at index = 245.
References
(not a complete list by any means)
- read_file_into_c_string_and_array_of_lines.c, from my eRCaGuy_hello_world repo
- https://en.cppreference.com/w/c/io/fopen
- https://en.cppreference.com/w/c/string/byte/strerror - shows a good usage example of
printf("File opening error: %s\n", strerror(errno));
if fopen()
fails when opening a file.
- https://en.cppreference.com/w/c/io/fgetc