There are many ways to approach this problem. Stephan has a good approach using the functions available in string.h
(and kindly contributed the example source
string). Another basic way to approach this problem (or any string parsing problem) is to simply walk-a-pointer down the string, comparing characters as you go and taking the appropriate action.
When doing so with multiple-delimiters (e.g. ','
and (...)
, it is often helpful to indicate the "state" of your position within the original string. Here a simple flag in
(for inside or outside (...)
) well let you control whether you copy characters to your array or skip them.
The rest is just keeping track of your indexes and protecting your array bounds as you loop over each character (more of an accounting problem from a memory standpoint -- which you should do regardless)
Putting the pieces together, and providing additional details in comments in-line below, you could do something like the following:
#include <stdio.h>
#define MAXS 10 /* if you need constants -- declare them */
#define MAXL 30 /* (don't use 'magic numbers' in code) */
int main (void) {
const char* source ="(000,P,ray ),"
"(100,D,ray ),"
"(009,L,art ),"
"(0000,C,max ),"
"(0000,S,ben ),"
"(020,P,kay ),"
"(040,L,photography ),"
"(001,C,max ),"
"(0001,S,ben ),"
"(0001,P,kay )";
char each[MAXS][MAXL] = {{0}},
*p = (char *)source;
int i = 0, in = 0, ndx = 0; /* in - state flag, ndx - row index */
while (ndx < MAXS && *p) { /* loop over all chars filling 'each' */
if (*p == '(') { /* (while protecting your row bounds) */
each[ndx][i++] = *p; /* copy opening '(' */
in = 1; /* set flag 'in'side record */
}
else if (*p == ')') {
each[ndx][i++] = *p; /* copy closing ')' */
each[ndx++][i] = 0; /* nul-terminate */
i = in = 0; /* reset 'i' and 'in' */
}
else if (in) { /* if we are 'in', copy char */
each[ndx][i++] = *p;
}
if (i + 1 == MAXL) { /* protect column bounds */
fprintf (stderr, "record exceeds %d chars.\n", MAXL);
return 1;
}
p++; /* increment pointer */
}
for (i = 0; i < ndx; i++) /* display results */
printf ("each[%2d] : %s\n", i, each[i]);
return 0;
}
(note: above, each row in each
will be nul-terminated by default as a result of initializing all characters in each
to zero at declaration, but it is still good practice to affirmatively nul-terminate all strings)
Example Use/Output
$ ./bin/testparse
each[ 0] : (000,P,ray )
each[ 1] : (100,D,ray )
each[ 2] : (009,L,art )
each[ 3] : (0000,C,max )
each[ 4] : (0000,S,ben )
each[ 5] : (020,P,kay )
each[ 6] : (040,L,photography )
each[ 7] : (001,C,max )
each[ 8] : (0001,S,ben )
each[ 9] : (0001,P,kay )
Get comfortable using either method. You can experiment whether using if.. else if..
or a switch
best fits any parsing problem. The functions in string.h
can be the better choice. It all depends on your input. Being comfortable with both approaches helps you better tailor your code to the problem at hand.
Example with getline
and realloc
of Rows
Since you are using getline
to read each line, it will potentially read and allocate storage for an unlimited number of records (e.g. (...)
). The way to handle this is to allocate storage for your records (pointers) dynamically, keep track of the number of pointers used, and realloc
to allocate more pointers when you reach your record limit. You will need to validate each allocation, and understand you allocate each
as a pointer-to-pointer-to-char (e.g. char **each
) instead of each
being a 2D array (e.g. char each[rows][cols]
). (though you will still access and use the string held with each the same way (e.g. each[0], each[1], ...
))
The code below will read from the filename given as the first argument (or from stdin
if no argument is given). The approach is a standard approach for handling this type problem. each
is declared as char **each = NULL;
(a pointer-to-pointer-to-char). You then allocate an initial number of pointers (rows) for each with:
each = malloc (rows * sizeof *each); /* allocate rows no. of pointers */
if (!each) { /* validate allocation */
perror ("each - memory exhausted"); /* throw error */
return 1;
}
You then use getline
to read each line into a buffer (buf
) and pass a pointer to buf
to the logic we used above. (NOTE, you must preserve a pointer to buf
as buf
points to storage dynamically allocated by getline
that you must free
later.)
The only addition to the normal parsing logic is we now need to allocate storage for each of the records we parse, and assign the address of the block of memory holding each record to each[x]
. (we use strcpy
for that purpose after allocating the storage for each record).
To simplify parsing, we originally parse each record into a fixed size buffer (rec
) since we do not know the length of each record ahead of time. You can dynamically allocate/reallocate for rec
as well, but that adds an additional level of complexity -- and I suspect you will struggle with the additions as they stand now. Just understand we parse each record from buf
into rec
(which we set at 256
chars #define MAXR 256
-- more than ample for the expected 30-31
char record size) Even though we use a fixed length rec
, we still check i
against MAXR
to protect the fixed array bounds.
The storage for each record and copy of parsed records from rec
to each[ndx]
is handled when a closing )
is encountered as follows:
(note - storage for the nul-character is included in 'i'
where you would normally see 'i + 1'
)
each[ndx] = malloc (i); /* allocate storage for rec */
if (!each[ndx]) { /* validate allocation */
perror ("each[ndx] - memory exhausted");
return 1;
}
strcpy (each[ndx], rec);/* copy rec to each[ndx] */
(note: by approaching allocation in this manner, you allocate the exact amount of storage you need for each record. There is no wasted space. You can handle 1 record or 10,000,000 records (to the extent of the memory on your computer))
Here is your example. Take time to understand what every line does and why. Ask questions if you do not understand. This is the meat-and-potatoes of dynamic allocation and once you get it -- you will have a firm understanding of the basics for handling any of your storage needs.
#include <stdio.h>
#include <stdlib.h> /* for malloc, realloc */
#include <string.h> /* for strcpy */
#define ROWS 10 /* initial number of rows to allocate */
#define MAXR 256 /* maximum record length between (...) */
int main (int argc, char **argv) {
int in = 0; /* in - state flag */
char **each = NULL, /* pointer to pointer to char */
*buf = NULL; /* buffer for getline */
size_t rows = ROWS, /* currently allocated row pointers */
ndx = 0, /* ndx - row index */
n = 0, /* buf size (0 - getline decides) */
i = 0; /* loop counter */
ssize_t nchr = 0; /* num chars read by getline (return) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
each = malloc (rows * sizeof *each); /* allocate rows no. of pointers */
if (!each) { /* validate allocation */
perror ("each - memory exhausted"); /* throw error */
return 1;
}
while ((nchr = getline (&buf, &n, fp) != -1)) { /* read line into buf */
char *p = buf, /* pointer to buf */
rec[MAXR] = ""; /* temp buffer to hold record */
while (*p) { /* loop over all chars filling 'each' */
if (*p == '(') { /* (while protecting your row bounds) */
rec[i++] = *p; /* copy opening '(' */
in = 1; /* set flag 'in'side record */
}
else if (*p == ')') {
rec[i++] = *p; /* copy closing ')' */
rec[i++] = 0; /* nul-terminate */
each[ndx] = malloc (i); /* allocate storage for rec */
if (!each[ndx]) { /* validate allocation */
perror ("each[ndx] - memory exhausted");
return 1;
}
strcpy (each[ndx], rec);/* copy rec to each[ndx] */
i = in = 0; /* reset 'i' and 'in' */
ndx++; /* increment row index */
if (ndx == rows) { /* check if rows limit reached */
/* reallocate 2X number of pointers using tmp pointer */
void *tmp = realloc (each, rows * sizeof *each * 2);
if (!tmp) { /* validate realloc succeeded */
perror ("realloc each - memory exhausted");
goto memfull; /* each still contains original recs */
}
each = tmp; /* assign reallocated block to each */
rows *= 2; /* update rows with current pointers */
}
}
else if (in) { /* if we are 'in', copy char */
rec[i++] = *p;
}
if (i + 1 == MAXR) { /* protect column bounds */
fprintf (stderr, "record exceeds %d chars.\n", MAXR);
return 1;
}
p++; /* increment pointer */
}
}
memfull:; /* labet for goto */
free (buf); /* free memory allocated by getline */
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (i = 0; i < ndx; i++) { /* display results */
printf ("each[%2zu] : %s\n", i, each[i]);
free (each[i]); /* free memory for each record */
}
free (each); /* free pointers */
return 0;
}
(note: since nchr
isn't used to trim the '\n'
from the end of the buffer read by getline
, you can eliminate that variable. Just note that there is no need to call strlen
on the buffer returned by getline
as the number of characters read is the return value)
Example Use/Output
Note: for the input test, I just put your line of records in the file dat/delimrecs.txt
and copied it 4 times giving a total of 40
records in 4
lines.
$ ./bin/parse_str_state_getline <dat/delimrecs.txt
each[ 0] : (000,P,ray )
each[ 1] : (100,D,ray )
each[ 2] : (009,L,art )
each[ 3] : (0000,C,max )
each[ 4] : (0000,S,ben )
<snip 5 - 34>
each[35] : (020,P,kay )
each[36] : (040,L,photography )
each[37] : (001,C,max )
each[38] : (0001,S,ben )
each[39] : (0001,P,kay )
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/parse_str_state_getline <dat/delimrecs.txt
==13035== Memcheck, a memory error detector
==13035== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==13035== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==13035== Command: ./bin/parse_str_state_getline
==13035==
each[ 0] : (000,P,ray )
each[ 1] : (100,D,ray )
each[ 2] : (009,L,art )
each[ 3] : (0000,C,max )
each[ 4] : (0000,S,ben )
<snip 5 - 34>
each[35] : (020,P,kay )
each[36] : (040,L,photography )
each[37] : (001,C,max )
each[38] : (0001,S,ben )
each[39] : (0001,P,kay )
==13035==
==13035== HEAP SUMMARY:
==13035== in use at exit: 0 bytes in 0 blocks
==13035== total heap usage: 46 allocs, 46 frees, 2,541 bytes allocated
==13035==
==13035== All heap blocks were freed -- no leaks are possible
==13035==
==13035== For counts of detected and suppressed errors, rerun with: -v
==13035== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
This is a lot to take in, but this is a basic minimal example of the framework for handling an unknown number of objects.