"Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.
For reading a fixed number of fields (in your case choosing cols 1, 2, 5
as string values of unknown length) and cols 3, 4
as simple int
values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.
An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc
for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2
or 2
or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.
Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf
and validating that 5 conversions took place by checking the sscanf
return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.
(there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)
As discussed in the comments, reading a line of data with a line-oriented input function like fgets
or POSIX getline
and then parsing the data by either tokenizing with strtok
, or in this case with a fixed number of fields simply parsing with sscanf
is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf
, but that also injects the scanf
problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)
The easiest way to approach storage of your 5-fields is to declare a simple struct
. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int
, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct {
char *col1, *col2, *col5;
int col3, col4;
} mydata_t;
Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8
or 16
with a doubling scheme as that will grow reasonably fast), but we have chosen 2
here with #define ARRSZ 2
to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024
for your data (don't skimp on buffer size)
To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size)
you realloc
, e.g.
int main (int argc, char **argv) {
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data))) {
perror ("malloc-data");
return 1;
}
while (fgets (buf, MAXC, fp)) { /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != '\n' && len == MAXC - 1) {
fprintf (stderr, "error: line %zu exceeds MAXC chars.\n", line);
return 1;
}
if (row == arrsz) { /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) { /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
}
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
}
(note: when calling realloc
, you never realloc the pointer itself, e.g. data = realloc (data, new_size);
If realloc
fails (and it does), it returns NULL
which would overwrite your original pointer causing a memory leak. Always realloc
with a temporary pointer, validate, then assign the new block of memory to your original pointer)
What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5) {
fprintf (stderr, "error: invalid format line %zu\n", line);
continue; /* get next line */
}
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) { /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
if (!(data[row].col2 = mystrdup (c2))) { /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) { /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
row++; /* increment number of row pointers used */
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in struct\n");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %s\n", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
}
And we are done (except for the memory use/error check)
Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup()
You can use strdup()
if you have it, I simply included the function to show you how to manually handle the malloc
and copy. Note: how the copy is done with memcpy
instead of strcpy
-- Why? You already scanned forward in the string to find '\0'
when you found the length with strlen
-- no need to have strcpy
repeat that process again -- just use memcpy
.
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
{
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
}
Last helper function is freemydata()
which just calls free()
on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc
block to move that to it's own function as well)
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
{
for (size_t i = 0; i < n; i++) { /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
}
free (data); /* free structs */
}
Putting all the pieces together would give you:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct {
char *col1, *col2, *col5;
int col3, col4;
} mydata_t;
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
{
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
}
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
{
for (size_t i = 0; i < n; i++) { /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
}
free (data); /* free structs */
}
int main (int argc, char **argv) {
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data))) {
perror ("malloc-data");
return 1;
}
while (fgets (buf, MAXC, fp)) { /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != '\n' && len == MAXC - 1) {
fprintf (stderr, "error: line %zu exceeds MAXC chars.\n", line);
return 1;
}
if (row == arrsz) { /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) { /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
}
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
}
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5) {
fprintf (stderr, "error: invalid format line %zu\n", line);
continue; /* get next line */
}
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) { /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
if (!(data[row].col2 = mystrdup (c2))) { /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) { /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
row++; /* increment number of row pointers used */
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in struct\n");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %s\n", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
}
Now test.
Example Input File
$ cat dat/fivefields.txt
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Example Use/Output
$ ./bin/fgets_fields <dat/fivefields.txt
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgets_fields <dat/fivefields.txt
==1721== Memcheck, a memory error detector
==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==1721== Command: ./bin/fgets_fields
==1721==
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
==1721==
==1721== HEAP SUMMARY:
==1721== in use at exit: 0 bytes in 0 blocks
==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
==1721==
==1721== All heap blocks were freed -- no leaks are possible
==1721==
==1721== For counts of detected and suppressed errors, rerun with: -v
==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.